Methods and systems for searching and associating information resources such as web pages

ABSTRACT

A method manages information resources in a computer system. The method includes receiving user information from an input device. The user information is representative of a declaration that a second resource accessible by the system should be associated with a first resource. The method further includes storing in association with the second resource an identifier of the first resource. The method identifies other resources that are relevant with respect to the second resource using a relevance scoring process. The method further determines whether the second resource with respect to other relevant resources has a first identifier associated herewith when one of the other resources is to be accessed by the system for display, and if it does, displays signaling information distinct from the display of the other resource itself signaling of the existence of the first resource.

The present invention relates in a general manner to methods and systemsfor managing resources such as web pages accessible via the Internet, orany other types of documents, aimed on the one hand at improving theobtaining of resources that are “close” to given resources, in terms inparticular of centers of interest for the user, and aimed on the otherhand at allowing the user, in a particularly simple and intuitivemanner, to effect associations between resources himself, especially soas to benefit therefrom during the obtaining of close resources.

STATE OF THE ART

The quantity of information potentially relevant for each individual isbecoming such that the present procedures for storing and searching forinformation are scarcely adequate. Alongside systems making it possibleto retrieve information organized explicitly (such as “favorites”) or bykey words (via a search engine), it would be desirable to have availablea method which spontaneously proposes context dependent relevantinformation.

Systems which provide relevant links (or rather “related links” to usethe jargon) with respect to a current page visited on the web are known.Typically these systems comprise an extension to the Internet browserwhich communicates with a remote server which provides the relevantlinks as a function of the current page presented in the browser's mainwindow. Typically these links are presented, in the form of a list ofURLs, in a window adjacent to the browser's main window.

However, such systems are not extended to serve as associative memory.

SUMMARY OF THE INVENTION

An object of the present invention is to propose computer methods andsystems for searching for resources (especially web pages, diversecomputer documents) that are “close” to given resources (this notion ofcloseness being made explicit later), and methods for the associativemanagement of resources.

In particular, the invention is aimed at characterizing informationelements with respect to new pages which appear on the web, thus openingup the way to multiple new applications of dynamic management of contentwith respect to the user's browsing context.

More precisely, it is the aim of the invention that each informationelement be associated with links on relevant web pages whichcharacterize it and which are automatically maintained up to date. It isthus possible to characterize nontexual information, such as photos,sounds and animations (in flash, etc.) and dynamically select theelements to be presented to the user as a function of the context of hisbrowsing which is also characterized by sets of relevant web pages. Thisapproach is suitable especially, but not exclusively, for magazines inthe art of living, fashion and in all other areas of “taste” where it isdifficult to characterize through key words the interest shown by thesubscriber in an item of information (when for example it represents apiece of music, a piece of art, a culinary dish, etc).

Another object of the invention is to associate other targeted elements,such as targeted advertisements, with information elements, in exchangefor an innovative associative memory service offered to surfers.

In particular, the aim is that, typically by means of an extension oftheir browser (extension downloadable from a given website), users canuse the information elements of this site as “associative memory”. Thus,during the user's browsing, the most relevant element of the site withrespect to the web page visited—as well as with respect to the browsingcontext—will be presented to him spontaneously; the user will then beable to drag and drop onto this element any resource from his computer,such as the icon of a file of the client station, or else the URL of aweb page, so as to store it. Thereafter, each time he visits any webpage which is relevant with respect to this element, the resource thathe had stored will be presented to him spontaneously, together with theresources (such as advertisements) that the author of the element hadhimself associated with the element. The advertisements presented willthus correspond to the current centers of interest of the user and areprovided in exchange for a new associative memory service.

The invention is aimed moreover at harnessing modern user interfaces tocreate, in a particularly simple and intuitive manner, associationsbetween information resources (web pages, or document files) especiallywithin the framework of the above objectives.

The invention proposes according to a first aspect a method fordetermining relevant additional resources with respect to a given set ofstarting resources, characterized in that it comprises the followingsteps:

-   a) identifying a set of citing resources that consist of all the    resources having a link to at least one of the starting resources,-   b) forming a set of candidate resources that consists of the set of    resources cited by the citing resources,-   c) for each candidate resource, calculating a candidate resource    relevance score between said candidate resource and the set of    starting resources on the basis of the existence of links situated    in the citing resources and directed toward the candidate resource    and toward the starting resources, and on the basis also of citing    resource relevance scores assigned to each of the citing resources,-   d) for each citing resource, recalculating a citing resource    relevance score on the basis of the existence, in the citing    resource in question, of links to the candidate resources and on the    basis also of the candidate resource relevance scores allocated to    the candidate resources in step c),-   e) repeating as appropriate step c) and step d) as appropriate one    or more times followed by step c),-   f) determining said relevant additional resources as being the    candidate resources which exhibit the best candidate resource    relevance scores (and as appropriate also the citing resources which    exhibit the best citing resource relevance scores).

The relevance score calculation performed in step c) comprises thecalculation of a plurality of sums of citing resource relevance scores,each sum advantageously comprising only the relevance scores of theciting resources comprising a link to a given resource consisting of thecandidate resource or a starting resource.

In a preferred manner, the above method also comprises the calculationof at least one sum of citing resource relevance scores, each sumcomprising only the relevance scores of the citing resources comprisinga link to one among a set of at least two given resources, this setcomprising the candidate resource and at least one starting resource.

According to a second aspect, the invention proposes a method fordetermining relevant additional resources with respect to a given set ofstarting resources, characterized in that it comprises the followingsteps:

-   a) identifying a set of cited resources that consist of all the    resources having a link to at least one of the starting resources,-   b) forming a set of candidate resources that consists of the set of    resources citing the cited resources,-   c) for each candidate resource, calculating a candidate resource    relevance score between said candidate resource and the set of    starting resources on the basis of the existence of links situated    in the candidate resource and in the starting resources and directed    toward the cited resources, and on the basis also of cited resource    relevance scores assigned to each of the cited resources,-   d) for each cited resource, recalculating a cited resource relevance    score on the basis of the existence, in the cited resource in    question, of links to the candidate resources and on the basis also    of the candidate resource relevance scores allocated to the    candidate resources in step c),-   e) repeating as appropriate step c) and step d) as appropriate one    or more times followed by step c),-   f) determining said relevant additional resources as being the    candidate resources which exhibit the best candidate resource    relevance scores (and as appropriate also the cited resources which    exhibit the best cited resource relevance scores).

The invention furthermore proposes a system for browsing amonginformation resources, each resource comprising at least one linkactivatable in a first mode by an input device so as to bring aboutaccess to another information resource designated by a resourceidentifier associated with this link, characterized in that at leastcertain resources comprise at least one link activatable in a secondmode with the aid of an input device so as to send to an engine forsearching for new information resources a search query containing theresource identifier associated with the link in question.

This system exhibits the following preferred but optional aspects:

-   the input device is able to activate the link simultaneously in the    first and second modes.-   the activation of the link in the second mode is able to bring about    the displaying of a pre-existing query, to which the resource    identifier associated with the link in question is able to be added.-   the activation of the link in the second mode is able to display, in    addition to the pre-existing query, the information resource    designated by said resource identifier.

The invention also proposes a system for searching for new informationresources on the basis of existing information resources, characterizedin that it comprises a search engine based on the analysis of linksbetween the various resources and accepting as input a query comprisinga series of resource identifiers, a means of selecting identifiers whichis able to store a set of identifiers (URI) of resources selected oneafter the other by a user, and a user activatable query generating meansfor devising a query containing the set of identifiers previouslyselected destined for the search engine.

In a preferred but nonlimiting manner, the means of selection is able tostore the identifiers selected in a remanent manner, in such a way thatthe means of selection can be implemented in a manner staggered overtime with a view to the generation of one and the same query.

The invention moreover proposes a method of searching for newinformation resources on the basis of existing information resources,characterized in that it comprises the implementation of a search enginebased on the analysis of links between various resources and acceptingas input a query comprising a series of resource identifiers and in thatit comprises the following steps:

-   -   selection of identifiers (URI) of resources one after the other        by a user;    -   generation of a query containing the set of identifiers        previously selected destined for the search engine.

There is also proposed a method of searching for new informationresources on the basis of existing information resources, characterizedin that it comprises the implementation of a search engine based on theanalysis of links between various resources and accepting as input aquery comprising a series of resource identifiers and in that itcomprises the following steps:

-   -   generation of a query containing a set of identifiers of        resources previously stored in one and the same group of        resource identifiers individual to a user, destined for the        search engine,    -   generation of a signaling for the attention of the user when at        least one new resource identifier belonging to the group in        question has been found by the engine.

According to a preferred aspect of the above method, each group ofresource identifiers is represented by a graphical object on a displaydevice of the user, and in that said signaling is carried out at leastby change of appearance of this graphical object.

The invention furthermore proposes a method of managing resources in acomputer system provided with a display screen and with an input devicefor cursor movement and actuation such as a mouse, each resourcepossessing a representation displayed on the screen in such a way as tobe able to be moved with the aid of the input device, methodcharacterized in that it comprises the following steps:

-   -   movement of the representation of a first resource so as to        bring it above the representation of a second resource,    -   followed by storage, in an associative memory for managing        resources, of information of association between the first and        second resources.

Certain preferred, but optional, aspects of this method are thefollowing:

-   the movement step is performed by a drag and drop technique.-   the method furthermore comprises, subsequent to the identification    of a given resource in a resource consultation process, the    following steps:    -   reading of the associative memory for managing resources to        determine whether other resources are associated with said given        resource, and    -   if so, signaling on the display screen of the existence of the        associated resource or resources.-   the resources comprise files.-   the resources comprise resources accessible via a network such as    the Internet.-   the identification of a given resource is obtained via a process for    identifying similar or relevant resources with respect to at least    one starting resource.-   in the case where the reading of the associative management memory    determines the existence of several associated resources, the    signaling step comprises the ordered signaling of at least part of    said several associated resources.-   the ordered signaling is based on the determination of relevance    scores of said associated resources.-   the associative memory for managing resources is contained in a    server accessible from a plurality of individual stations in which    the movement step can be implemented.-   the associations between resources are stored user by user.-   the associations between resources are stored in a mutualized manner    between several users.

The invention also proposes a method for identifying on the basis of atext resource, part of said resource able to constitute a pertinentquery for a search engine, characterized in that it comprises thefollowing steps:

-   -   removing the nonpertinent words from the text;    -   establishing and completing a memory of links between parts of        said text, where a part is linked to another when it contains at        least one pertinent word in common;    -   implementing a method of determining resource scores by analysis        of a graph of resource nodes connected by links, where each        resource used in this method consists of a part of the text, on        the parts of the text that are thus interconnected;    -   using at least one of the text parts consisting of the candidate        resources determined by said method as query text or as basis        for a query text.

Advantageously, the step of implementing the method for distillingresources is performed only with text parts selected as prevalent, wherethe citing text parts are the text parts which comprise at least oneword in common with the prevalent text part or parts, where a link iscreated from each citing text part to the prevalent text part or parts,where the text parts containing at least one word also contained in theciting text parts are identified, so as to form a group of co-cited textparts, and where a link is temporarily created from each citing textpart to each co-cited text part with which said citing text partpossesses at least one word in common.

The text parts are typically phrases.

According to another aspect, the invention proposes a method of managinginformation resources such as web pages in a computer system comprisinga user station furnished with a display screen, each resource possessingan identifier (URI) allowing its access from the user station, methodcharacterized in that it comprises the following steps:

-   a) declaration by the user of an association between two resources,    by associating with a second resource the identifier of a first    resource;-   b) identification of other relevant resources with respect to the    second resource; and-   c) during access to one of the other resources (current page),    signaling of the existence of the first resource.

According to certain preferred but nonlimiting aspects:

-   step b) comprises the selection of other resources that are most    relevant for the implementation of step c).-   step a) is implemented for a plurality of second resources belonging    to a group, and in that step b) comprises the identification of    other relevant resources with respect to the set of second resources    of the group.-   step b) is triggered by the carrying out of step (a).-   step (b) is implemented subsequently to the access envisaged in    step (c) to determine whether the other resource which it has    accessed is another relevant resource with respect to the second    resource.-   step (b) is implemented by supplying an identifier of the second    resource to a server for determining relevant resources.-   step (b) is implemented by identifying other relevant resources with    respect to at least one intermediate resource (spot) with respect to    which the second resource is predetermined as being relevant.-   the method furthermore comprises the displaying, in the vicinity of    an area for displaying resources, of representations of links to at    least certain among the first resources, the intermediate resources,    and relevant resources with respect to the intermediate resources.-   step (a) is implemented by acting with the aid of an input device on    graphical objects representative of the first and second resources.

The invention moreover proposes a method for identifying informationresources accessible via recent links (such as web pages), relevant withrespect to at least one given resource, characterized in that itcomprises the following steps:

-   -   applying a query comprising an identifier of said given resource        to a system for determining relevance between resources,    -   selecting a first set of resources that are the most relevant        (e.g. best hub scores) with respect to said given resource,    -   searching, through each of the most relevant resources, for the        regions possessing links to other resources of averagely high        relevance, so-called relevant regions,    -   monitoring the appearance, in said relevant regions, of new        links which point to resources which were not yet known to the        system, so-called new resources,    -   selecting a second set of resources having a high relevance        (e.g. best hypertext authority scores) with respect to said        given resource,    -   selecting the new resources which have a highest similarity of        content with respect to the resources of said second set of        resources and according the new resources selected a relevance        level (similarity authority score) dependent on time as a        function of said similarity of content.

According to yet another aspect, the invention proposes a method forallowing access by a user to relevant information entities from astarting information entity, each information entity being accessiblevia an identifier (URI), characterized in that it comprises thefollowing steps:

-   -   a) providing at least one similar information entity, exhibiting        a content similar to that of the starting entity, and        determining the identifier of the or of each similar information        entity, and    -   b) determining on the basis of the or each similar information        entity identifier a set of one or more identifiers of        information entities relevant with respect to the or each        similar information entity.

Preferred, but nonlimiting aspects of the above method are as follows:

-   the method furthermore comprises the following step:    -   c) allowing the user to access at least certain relevant        information from their respective identifiers.-   the method furthermore comprises the following step:    -   d) on the basis of the relevant information entity identifiers        and of a given set of extra information entities, selecting the        extra entities that are most similar to the relevant information        entities.-   the method comprises an extra step of sorting the relevant    information entities by degree of relevance.-   the sorting step is preceded by a step of calculating a relevance    score with respect to the or each similar information entity for    each of the relevant information entities.-   each information entity consists of a page fragment written in a    standardized mark-up language, or of such a page as a whole.-   each identifier consists of a uniform resource identifier (URI) of    the fragment or of the page.-   step a) is carried out by selection by the user of one or more    information entities similar to the starting information entity.-   step a) is carried out by implementing a process for automatically    determining similar information entities.-   step a) is carried out by implementing a process for automatically    determining similar information entities, followed by a selection by    the user of one or more similar information entities from among the    similar information entities determined by said process.-   step b) is carried out by implementing a process for automatically    determining relevant information entities.-   the process for automatically determining relevant information    entities comprises the analysis of a graph structure of identifiers    that consists of the identifiers of information entities and of the    identifiers designated by user activatable links contained in said    information entities.

According to another aspect of the invention, a method for determiningrelevance scores of text units such as phrases in a textual document,comprises the following steps:

-   -   decomposition of the document into a plurality of text units,    -   selection of at least one relevant text unit and of candidate        text units,    -   determination of the set of pertinent words contained in the        relevant text unit (or units) and in each of the candidate text        units,    -   for each pertinent word contained in the relevant text unit (or        units), identification of the candidate text units citing this        pertinent word, to form a group of citing text units,    -   identification of the candidate text units containing at least        one pertinent word also cited in the citing text units, to form        a group of co-cited text units,    -   assigning to the co-cited text units a relevance score as a        function of said citations.

The invention also proposes a method for determining relevance scores oftext units such as phrases in a textual document, characterized in thatit comprises the following steps:

-   -   decomposition of the document into a plurality of text units,    -   selection of at least one relevant text unit and of candidate        text units,    -   determination of the set of pertinent words contained in the        relevant text unit (or units) and in each of the candidate text        units,    -   for each pertinent word contained in the relevant text unit (or        units), identification of the candidate text units comprising        this pertinent word, to form a group of cited text units,    -   identification of the candidate text units containing at least        one pertinent word also cited in the cited text units, to form a        group of co-citing text units,    -   assigning to the co-citing text units a relevance score as a        function of said citations.

The invention also proposes a method for determining scores allocated towords or groups of words contained in text units such as phrases in atextual document, characterized in that it comprises a step whichconsists in adding up the relevance scores, determined by one of themethods above, of the text units in which said words are located.

BRIEF DESCRIPTION OF THE DRAWINGS

FIGS. 1 to 7 of the appended drawings illustrate various stepsimplemented in the present invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

Lexicon

Resource (or element): Information resource such as a web page, a partof a web page, a document, or an XML element. Each resource may itselfconsist of resources, thus forming a tree structure.

Current resource: Resource accessed by the user at the current momentduring browsing (it is in particular the web page displayed in the mainwindow of the browser).

URI (Uniform Resource Identifier): Resource address. Will sometimes beused as a synonym for URL (universal resource locator).

Link: URI placed in a resource. In general, by clicking on a link, theuser can access the resource pointed at by it.

Cite (a first resource cites a second resource): the first resourcepossesses a link to the second resource.

Popular: Said of a resource which is accessed by a large number of users(for example on the web) from its URI.

Private resource: Resource that is not accessible by a large number ofusers (in particular which is not published on the web or is not widelyknown).

Associative storage: Addition of a link to a first resource, on a secondresource, so as to be able to retrieve the first resource via theassociative search method.

Associative search: In order to retrieve a first resource, access to arelevant resource with respect to a second resource to which a link tothe first resource has been added.

Added link: URI inserted by the user into a set of associated links.

Proposed spot: Spot presented by the system by priority since itcomprises the associated links that are most relevant with respect tothe current context.

Spot: A spot is composed:

-   -   of a set of links, in general associated with a reference        resource. The resources pointed at by the associated links are        accessible (for example on the web) from their respective URIs.        The associated links are composed of given associated links and        of completed associated links,    -   and (optionally) of one or more set of link(s) (in particular        links added by the creator of the spot and links added by users        of the spot), proposed to the user within the framework of the        associative search method,    -   and (optionally) of a link to said reference resource, said        associated links being selected as being relevant with respect        to this reference resource,

Domain of relevance of a spot: set of resources designated by associatedlinks of this spot.

Given associated links: Associated links specified explicitly (bywhoever creates or publishes the resource with which said set isassociated, or else by whoever creates a spot for this resource).

Completed associated links: Associated links determined automatically(in particular by means of a relative distillation algorithm describedin the present description).

Associated link score: Score of relevance with respect to the set ofgiven associated links. This score may be calculated by a relativedistillation algorithm such as one of those described in the presentdescription.

Authority score: Relevance score of a resource with respect to a set ofgiven associated links.

Hub score: Relevance score of a resource citing other resources,representing the relevance of the cited resources with respect to a setof given associated links.

Non-contextual score: Context independent relevance score.

Contextual score: Context dependent relevance score.

Noncontextual spot: With respect to a resource (or to a set ofresources) in question: Spot whose associated links comprise the URI ofthe resource in question (or at least some of the URIs of the resourcesin question) with a score (or a mean score) that is greater than a giventhreshold or that is selected in such a way as to maximize it (cf. thespot selection procedure described in the present description).

Contextual spot: Spot whose associated links are the most relevant withrespect to the context.

Context: Browsing context.

Spot server: Server on the Internet providing the association betweenassociated link and spot.

Current spot server: Spot server to which the user is directlyconnected.

Relevant region of a resource: Part of a resource containing at leastone relevant link and containing no nonrelevant link.

Methods of Associative Storage and Associative Search

[Vocabulary used:

First page=page stored by the user so that he can retrieve it easily;

Second page=page used by the user as storage medium (to store anassociation with the first page, which we shall subsequently refer to as“for storing the first page” for the sake of conciseness);

Current page=page presented at the current moment in the main window ofthe Internet browser.

These are for example web pages, however the first page may be a privateresource such as a document (text, multimedia or other document) whichbelongs to him].

The system allows the user to add a link to a first page on any secondpage whatsoever (or in the vicinity of the second page; we shallsubsequently use the expression “on the second page” for the sake ofconciseness).¹ ¹ The step consisting in adding a link in this manner, ona second resource, to a first resource (so as to be able to retrieve itby the method described in this report) is called associative storage.

The user accesses the pages by means of a browser furnished with thesystem specific extension (or via an intermediate web server). Adding alink can be done for example by a drag and drop: the user grabs a handlerepresenting the first page and drops it onto the second page; forexample the link added is then presented by the system as a vignette inthe style of a “post-it” in the place where it was dropped, or in awindow adjacent to the main window of the browser (or in a frameadjacent to the frame presenting the original web page). He can alsodrop it on an icon representing the second page (for example in hisfavorite links). The system then stores the relation with the userconsidered, the association between the link on the first page and thesecond page in question.

Thereafter, when the user accesses a page relevant with respect to thesecond page (or the second page itself), the URI² of this added link tothe first page is automatically presented to him. ² As well asoptionally other indications pertaining to the link added, such as thetext or the graphical object which accompanies the added link, or else asimplified or miniaturized presentation of the first page itself.

Thus, to retrieve the first page, the user merely has to access any pagewhatsoever³ which is relevant with respect to the second page. ³ Saidany page whatsoever is already or will have to be taken into account bythe system. The user will thus prefer to choose a popular page to speedup the search. The system is furnished with a crawler the aim of whichis precisely to take into account as many accessible pages (especiallyon the Internet) as possible which are of interest to the user.

More simply, in so far as:

-   -   the user chooses said second page because it is relevant with        respect to the first page    -   and that the relevance relation is transitive at this level,        to retrieve the first page, the user merely has to access any        page (accessible by the system) which is relevant with respect        to the first page: this is the associative search method.⁴ ⁴ To        facilitate reading, the storage/associative search method is        described here while speaking of pages, but the method applies        more widely to resources (not only to pages).

Note that during the step of associative storage the user can increasehis chances by adding a link to the first page on several second pages.

Furthermore, in so far as the relevance relations are symmetric, theadded links are implicitly bi-directional. Furthermore, in the casewhere the current page is a private resource, the system can liken it tothe second page(s) on which, as appropriate, the user had added a linkto this private resource, and present the other first pages that he alsoadded on this (these) second page(s).

The step of associative storage can be automated (or be computer aided).Specifically, the addition of a link to a first page on a second pagecan be (semi-) automated according to the following steps:

-   I—determine key words or main phrases of the first page (that are    contained in the page or associated with it—for example are    delimited by “meta-tags”),-   II—provide these key words or main phrases to a search engine which    will return a set of links on pages containing these key words,-   III—take at least one subset thereof (for example the best N    according to the search engine) so as to use them as second pages,-   IV—add a link to the first page on these second pages.

Note that as regards step I, various techniques for automaticallyextracting key words or main phrases of a text already exist.

The key words may also be extracted from the text in the followingmanner:

-   -   for each word, determine the score of this word by adding up the        scores of the phrases in which it is located and then        normalizing these scores (for example by dividing each score        thus obtained by the square root of the sum of the squares of        all the scores thus obtained);    -   select the words having the largest scores as key words.

The two methods presented above may be combined by retaining from thekey words selected only those which are located in the phrases selected.The complete method for extracting the key words from the text is thenas follows:

-   -   remove the nonpertinent words from the text (called “stop words”        in the literature);    -   identify the links between the phrases: a phrase is linked to        another when it contains at least one word in common;    -   apply the absolute distillation procedure (described later), or        an equivalent procedure utilizing a graph of links (such as        PageRank), to the phrases thus interlinked, to determine their        scores;    -   for each word, determine the score of this word by adding up the        scores of the phrases in which it is located and normalized;    -   select the phrases having the largest scores as being the main        phrases of the text.

As a variant, in so far as (one or) certain phrases of the text may belabeled as being prevalent, to determine the scores of the phrases,instead of the absolute distillation procedure it is possible to use therelative distillation procedure (described later) to determine therelevance score of the phrases with respect to said prevalent phrases.

Moreover, instead of actual phrases, it is possible to consider any kindof text parts or units. The method using relative distillation thusconsists in determining relevance scores of co-cited “text units” (suchas phrases):

The text units comprising at least one word in common with the prevalentunit (or set of units) are identified so as to form a group of citingtext units. A link is created (temporarily) from each citing text unitto the prevalent text unit (or set of units).

The text units containing at least one word also contained in the citingtext units are identified so as to form a group of co-cited text units.A link is created (temporarily) from each citing unit to each co-citedunit with which said citing unit possesses at least one word in common.

One of the methods, described later, of calculating relevance scores bythe relative distillation procedure is then applied. The whole set ofidentifiers of the relevant text units constitutes the URIs of thequery.⁵ ⁵ The set of identifiers of the citing text units constitutesthe set R⁻. The set of identifiers of the co-cited text unitsconstitutes the set R⁻, and so on and so forth.

The implementation of the associative search system will now bedescribed.

To present, to a user who accesses a current page, links on first pages,the system performs the following steps:

-   Step a: determine the relevance score of second candidate pages with    respect to the current page⁶, ⁶ This step is composed of step a    and/or step a′ (see later . . . )-   Step b: select the (or a certain number of) second pages having (as    appropriate) a sufficient relevance score,-   Step c: present to the user the (URIs of the) first pages of the    links that he had added on the second pages which have been selected    in step b; optionally also present the (URIs of the) second pages    themselves to him.⁷ ⁷ To do this, as already mentioned, the system    possesses in memory the relation between user, second page (on which    the user in question has added links) and first page (link added by    the user in question on the second page in question). Thus the    system can firstly determine the set of second candidate pages for    the current user so as to perform step a, then in step c retrieve    the added links to be presented to the user.

As a variant, during the associative storage, instead of adding on thesecond page a link to the first page, the user can overlay onto thesecond page or insert thereinto an annotation (or any resource such asan icon or other graphical object), which then plays the role of firstpage within the sense of the present method. In this case, during stepc) of the associative search, the system presents the second page orpages which have been selected while also presenting their annotations(or the resource that has been added to them).⁸ ⁸ In the remainder ofthe description, the expression link added on a second page isunderstood to mean that we include this typical case where there is aresource added to the second page.

To facilitate reading, the following 7 steps (see FIG. 1) will beconsidered:

-   -   R consists of the pages⁹ of the query. ⁹ (“page” is understood        to mean “page URI”)    -   R⁻ is the set of pages which contain a link to¹⁰ at least one        page of the query. ¹⁰ (stated otherwise “which cite”, or else        “which point at”)    -   R⁻⁺ is the set of pages pointed at (cited) by at least one page        of R⁻.    -   R⁻⁺⁻ is the set of pages which cite at least one page of R⁻⁺        (R⁻⊂R⁻⁺⁻):    -   R⁺ is the set of pages cited by at least one page of the query        (R).    -   R⁺⁻ is the set of pages which cite at least one page of R⁺.    -   R⁺⁻⁺ is the set of pages cited by at least one page of R⁺⁻        (R⁺⊂R⁺⁻⁺).

To determine the relevance score of the second candidate pages withrespect to a current page R (understand R here as current resource¹¹),the system implements a method of “relative distillation” comprising atleast one out of the following steps a and a′. ¹¹ Since here the queryis formed of a single page.

Step a:

-   Step a-1: Identify the set R⁻ of pages which possess at least one    link to R;¹² ¹² A web search engine can be used to determine the    resources that point to a given resource.-   Step a-2: Retrieve in memory the set of second candidate pages for    the current user and perform the intersection between the set R⁻⁺ of    the pages pointed at by the pages of R⁻ (note that R is in the set    R⁻⁺) and the set of second candidate pages for the current user;-   Step a-3: For each page of the set resulting from step a-2,    calculate its relevance score (authority score) with respect to R.    (Note that this step includes the identification of the set of pages    of R⁻⁺⁻ possessing at least one link pointing to at least one subset    of the set resulting from step a-2—see the “selection of spots”    section).    Step a′:-   Step a′-1: Identify the set R⁺ of pages pointed at by R;-   Step a′-2: Retrieve in memory the set of second candidate pages for    the current user and perform the intersection between the set R⁺⁻ of    pages possessing at least one link to a page of R⁺ (note that R is    in the set R⁺⁻) and the set of second candidate pages for the    current user;-   Step a′-3: For each page of the set resulting from step a′-2,    calculate its relevance score (hub score) with respect to R. (Note    that this step includes the identification of the set of pages of    R⁺⁻⁺ pointed at by at least one subset of the set resulting from    step a′-2).

The calculation of the relevance scores in steps a-3 and a′-3 may beperformed by means in particular of one of the equations presented laterin the “selecting the spots” section which moreover describesimprovements to the method presented above. In particular the scores aresharpened by successive iterations. During these iterations, the hubpages in step a and the authority pages in step a′ also acquirerelevance scores (hub scores and authority scores respectively). Inaddition to the second candidate pages (that is to say in addition tothe URIs of the pages of R⁻⁺ in step a and/or of R⁺⁻ in step a′)determined as described hereinabove, it is then also possible toinclude, in the resulting set provided at step b, the hub pages of stepa and the authority pages of step a′ (since they now have relevancescores). Moreover the weights of the links between close pages¹³ arediminished so as to further improve the results. ¹³ To identify thecloseness of the pages to the ends of the links the system additionallyidentifies the set of pages R⁻ of the pages possessing at least one linkto the pages R⁻ and the set of pages R⁻⁺⁻ of the pages possessing atleast one link to the pages R⁻⁺⁻ (see the “filtering” section).

The system can therefore select the second pages that are most (orsufficiently) relevant to step b and perform step c to present theiradded links to the user.

The results obtained by the relative distillation method may be stored(then maintained—see later the “maintaining the spots” section) with theaim of avoiding recalculating them during accesses to the current pagesalready processed. Thus, the system maintains, in a second memory, thescores of the second pages with respect to the current pages in thecases where these scores are greater than a given threshold. For acurrent page already processed, the response of the system is thenalmost immediate.

Stated otherwise, step a is modified as follows:

Step a′: Consult the second memory to ascertain whether the second pagesmost relevant for the current page have already been stored (and ifthese data in memory are sufficiently fresh), as appropriate go to stepc, otherwise determine and store the relevance score of second candidatepages with respect to the current page.

As a variant, the system stores (then maintains—see later the“maintaining the spots” section) the necessary data without waiting fora user to access a current page; storage is triggered by the use, by theuser, of a new second page (as associative storage medium).

By utilizing the fact that the relevance scores are reflexive¹⁴, thesystem starts from each second page to construct R⁻ and R⁻⁺ and (R⁻⁺⁻)and/or R⁺ and R⁺⁻ (and R⁺⁻⁺), calculates by relative distillation therelevance scores of all the potential current pages, and stores them ina second memory (this being an inverse memory able to provide, for eachpotential current page, the second relevant pages). ¹⁴ (i.e. therelevance score of a second page with respect to a current page is equalto the relevance score of this current page with respect to this secondpage)

Moreover, as already indicated, the system maintains a first memorycontaining the links added by user and second page.¹⁵ ¹⁵ Note that,advantageously, the data in the second memory are not per user and maythus serve all the users.

Thus, when a user actually accesses a current page, the system selectsfrom the second memory the second pages—from among the second pages usedby this user as storage medium¹⁶—which have the highest relevance scoreswith respect to said current page, then retrieves (from the firstmemory) the links added by this user on these second pages. ¹⁶ (they areindicated in the first memory)

Stated otherwise, the method comprises the following steps¹⁷. ¹⁷ Stepsm1 and m2 describe the associative storage method, steps a, b and cdescribe the associative search method.

For each new second page R (on which a user adds a link)¹⁸: ¹⁸ Step m1is performed only for the new second pages, while step m2 is performedeach time a second page is used by a user, whether or not it is new forthe system.

Step m1: Perform at least one of steps m1-1 and m1-1′, then perform stepm1-2:

Step m1-1:

-   -   identify the set R⁻ of pages which possess at least one link to        R;    -   identify the set R⁻⁺ of potential current pages pointed at by        the pages of R⁻;    -   for each page of R⁻⁺ (except R) calculate its relevance score        (authority score—see the “selecting the spots” section) with        respect to R; note that this step includes the identification of        the set of pages R⁻⁺⁻ possessing at least one link pointing at        at least one subset of R⁻⁺ (see the “selecting the spots”        section);

Step m1-1′:

-   -   identify the set R⁺ of pages to which R possesses at least one        link;    -   identify the set R⁺⁻ of potential current pages pointing to at        least one page of R⁺;    -   for each page of R⁺⁻ (except R) calculate its relevance score        (hub score—see the “selecting the spots” section) with respect        to R; note that this step includes the identification of the set        of pages R⁺⁻⁺ pointed at by at least one subset of the elements        of R⁺⁻;

Step m1-2: store, in a second memory, the URIs of the pages having asufficient relevance score with respect to R, in relation to R, in sucha way that on the basis of the URI of each of said pages having asufficient relevance score with respect to R it is possible toretrieve¹⁹ (the second page) R as well as said sufficient relevancescore; ¹⁹ (As well as the other second pages, as appropriate, for whichthe relevance score of R is sufficient)

Step m2: (in parallel with step m1) store in a first memory, for eachuser and each second page, the added links that said user has added onsaid second page;

During access to a current page by a user:

(Step a is no longer necessary since the scores are already in memory).

Step b-m: Select from the second memory a certain number of secondpages²⁰, from among the second pages used by said user (that areindicated in the first memory), for which the relevance scores of saidcurrent page are the highest (if they exist); ²⁰ Normally, in the secondmemory, the URIs of the second relevant pages with respect to apotential current page are already sorted by relevance score.

Step c (unchanged): retrieve from the first memory the links added bysaid user on the second pages selected in step b-m and present them tosaid user (with optionally the second pages on which they have beenadded and in a sorted manner).

The improvements presented later in the “selecting the spots” sectionwill also be applied. In particular as the scores are sharpened bysuccessive iterations, the hub pages in step m1-1 and the authoritypages in step m1-1′ also acquire relevance scores (hub scores andauthority scores respectively) and may thus be included in the resultingset provided in step m1-2 (in addition to the URIs of the pages of R⁻⁺in step m1-1 and/or of R⁺⁻ in step m1-1′). Moreover, here also theweights of the links between close pages are diminished so as to improvethe results (see the “filtering” section).

With this latter method, the added links are presented almostimmediately by the system in all cases, that is to say even when acurrent page is accessed by a user for the first time.

It was mentioned that during the associative storage step the user canincrease his chances by adding a link to the first page on severalsecond pages. He will now be allowed to form groups of second pages towhich is added a link to the first page (the idea being that, as thefirst page may be of interest with respect to more than one center ofinterest of the user, the groups make it possible to class the firstpage with respect to distinct centers of interest, each groupcorresponding to a different center of interest).

Specifically, each time the user adds a link (to the first page) on anew second page, the group or groups of second pages that he had alreadyformed, as appropriate, for the first page are proposed to him by thesystem and he can then choose one or more of these groups into which toinsert said new second page, or otherwise he can create a new groupformed of the single new second page.

At the same time he can also manipulate his groups more widely, such asfor example delete a second page of a group, split a group into two,merge two groups, delete a group, etc. Finally, he can also duplicate agroup so as to add thereto a link on another first page.

Each group is processed by the system as a relative distillation query.In a similar manner to the last method described²¹, for each query R(that is to say for each group of second pages) the system identifiesand stores (then maintains—see later the “maintaining the spots”section) the potential current pages which have a sufficient relevancescore, and thus forms an inverse memory able to provide, for eachpotential current page, the most relevant queries (that is to say themost relevant groups). ²¹ The difference is that here R represents aquery formed of one or more resources whereas before R represented asingle resource (a single second page).

Stated otherwise, the associative storage comprises the following steps:

(Step m1 is performed only for the queries not already known by thesystem or not sufficiently fresh, while step m2 is performed for all theusers' queries, whether or not they are new for the system).

Step m1: Perform at least one of the steps m1-1 and m1-1′, then performstep m1-2:

Step m1-1:

-   -   identify the set R⁻ of pages which possess at least one link to        a page of R;    -   identify the set R⁻⁺ of pages (seen as potential current pages)        pointed at by at least one page of R⁻;    -   for each page of R⁻⁺ (except R) calculate its relevance score        (authority score—see the “selecting the spots” section) with        respect to R; note that this step includes the identification of        the set of pages R⁻⁺⁻ possessing at least one link pointing at        at least one subset of R⁻⁺ (see the “selecting the spots”        section);

Step m1-1′:

-   -   identify the set R⁺ of pages to which at least one page of R        possesses at least one link;    -   identify the set R⁺⁻ of potential current pages pointing to at        least one page of R⁺;    -   for each page of R⁺⁻ (except R) calculates its relevance score        (hub score) with respect to R; note that this step includes the        identification of the set of pages R⁺⁻⁺ pointed at by at least        one subset of R⁺⁻;

Step m1-2: Store, in a second memory, the URIs of the pages having asufficient relevance score with respect to R, in relation to R, in sucha way that on the basis of the URI of each of said pages having asufficient relevance score with respect to R it is possible toretrieve²² R as well as said sufficient relevance score; ²² (From amongthe set of queries stored, as appropriate, for this page)

Step m2: (in parallel with step m1) store in a first memory, for eachuser and query, the added links (to first pages);

During access to a current page by a user:

Step b-m: Select from the second memory a certain number of queries,from among the queries (groups) used by said user as associative storagemedium (that are indicated in the first memory), for which the relevancescores of said current page are the highest (if they exist);

Step c: retrieve from the first memory the links added by said user onthe queries selected in step b-m and present them to said user, withoptionally:

-   -   the (or a certain number of the) queries on which they have been        added,    -   as well as a certain number of (links to) relevant pages having        a relevance score estimated (in step m1-2) to be sufficient with        respect to said queries selected in step b-m.²³ ²³ These URIs        are analogous to “related links” mentioned in the “state of the        art” section, however they are more relevant since their        relevance scores have been calculated with respect to the query        with which they are associated by relative distillation.

The improvements presented later in the “selecting the spots” sectionwill also be applied. In particular as the scores are sharpened bysuccessive iterations, the hub pages in step m1-1 and the authoritypages in step m1-1′ also acquire relevance scores (hub scores andauthority scores respectively) and may thus be included in step m1-2 (inaddition to the URIs of the pages of R⁻⁺ in step m1-1 and/or of R⁺⁻ instep m1-1′). Moreover, here also the weights of the links between closepages are diminished so as to improve the results (see the “filtering”section).

In step b-m, the system provides a set of selected queries. It would beadvantageous to sharpen the selection in such a way as to present to theuser (the) request or requests²⁴ that are the most relevant with respectto the user's browsing context. This is what will now be described. ²⁴(With the first pages and the corresponding relevant links)

The history of a user's browsing is modeled with the aid of a “contextstack”, here with each link (that may be presented to the user) isassociated a relevance score at each browsing level, and when a link isnonexistent it is likened to a link whose score is equal to zero.

When the user clicks on a link and accesses a new page, the system addsa level to the context stack. On the other hand, when he clicks on the“back” command of his browser the system pops a level.

For a given link, the contextual score is an average of thenoncontextual scores²⁵ at each level of the context stack, these scoresbeing weighted as a function of depth. So as not to have to recalculateall the scores each time, an exponential weighting is used, thisimplying that the contextual score at a certain level is the weightedaverage of the noncontextual score at this level and of the contextualscore at the previous level. ²⁵ (That is to say determined taking noaccount of the context)

Stated otherwise, for a given URI, s being the noncontextual score atthe last level and r the contextual score at the previous level, thecontextual score at the last level is: lambda.r+(1−lambda).s (lambdabeing a constant weighting between 0 and 1, in principle less than ½:the larger lambda is, the more important is the past).

Among the queries (that is to say the groups) selected in step b-m, thesystem selects those which are closest to the context, that is to saythose for which the scores of the URIs stored in step m-2 are theclosest to the contextual scores for the user in question. To determinethe closeness of each request with the context, the system calculatesthe sum of the products, for each URI of the query, of the(noncontextual) score of the query with the contextual score for theuser in question.

Step b-m is thus replaced by the following step b′-m:

Step b′-m: select from the second memory a certain number of queries,from among the queries (groups) used by said user as associative storagemedium (and indicated in the first memory), for which the relevancescores of said current page are the highest (if they exist) and forwhich the relevance scores of the potential current pages are theclosest to the contextual relevance scores.

We shall now describe a method, utilizing the system of cookies, forrecognizing the user when he goes from one site to another, in such away as to be able to maintain his context stack.

Let us recall that the cookies system allows servers of sites of anInternet domain (i.e. domain name or IP address) to recognize a user(that is to say his computer) when he accesses web pages belonging toone and the same Internet domain.

The method described here allows a server, which implements ourmethod—it will be called a client server (CLI)—to recognize even userswho browse from one site to another which do not form part of one andthe same Internet domain, even though in their browsing these users passthrough sites that do not implement our method.

To do this, three communication mechanisms are used:

-   1—Each web page of a site of a client server contains a frame whose    address is that of a centralized server (URS) which manages our    method of recognizing the user (USER);-   2—The centralized server and each client server each have a cookie    stored in the user's computer (note that the creation time for these    cookies may be used to estimate the reliability of recognition of    the user);-   3—The client server communicates with the centralized server    directly.

There are three possible cases which are described hereinafter (see FIG.2).

New user for the client server and for the centralized server:

-   1. The user (the USER computer) opens a page of the clients site    (CLI server); there is no CLI cookie.-   2. CLI asks URS for a free identifier for USER and receives    ID=“123456”-   3. CLI sends back a page comprising two frames to USER    -   the first frame is at the address http://URS.com/ . . .        ?ID=“123456”    -   the second frame is at the address http://CLI.com/ . . .-   4. USER sends the http query to URS to ask for the content of the    first frame (http://URS.com/ . . . ?ID=“123456”); as there is no    cookie belonging to URS, URS concludes that this is a new user and    allocates him the identifier “123456”.-   5. URS responds and installs a cookie (containing ID=“123456” at    USER-   6. (In parallel with 5.) URS transmits [ID=“123456” (no    replacement)] to CLI-   7. (In parallel with 4.) USER sends CLI the http query to ask for    the content of the second frame-   8. (After receipt of the identifier at point 6) CLI sends USER the    content of the frame http://CLI.com/ . . .

New user for the client server but not for the centralized server:

-   1. USER opens a page of the client site (CLI server); there is no    CLI cookie.-   2. CLI asks URS for a free identifier for USER and receives    ID=“123456”-   3. CLI sends back a page comprising two frames to USER    -   the first frame is at the address http://URS.com/ . . .        ?ID=“123456”    -   the second frame is at the address http://CLI.com/ . . .-   4. USER sends the http query to URS to ask for the content of the    first frame (http://URS.com/ . . . ?ID=“123456”) as well as the    content of the cookie (created during a previous access and    comprising the identifier ID=“ABCDEF”)-   5. URS responds-   6. (In parallel with 5.) URS transmits [ID=“ABCDEF” replacing    ID=“123456”] to CLI (+optionally extra data specific to ID=“ABCDEF”)-   7. (In parallel with 4.) USER sends CLI the http query to ask for    the content of the second frame-   8. (After receipt of the identifier “ABCDEF” at point 6.) CLI sends    USER the content of the frame http://CLI.com/ . . . as well as a new    cookie comprising ID=“ABCDEF” as replacement for the previous one

User already known to the centralized server and to the client server:

-   1. USER opens a page of the client site (CLI server) and transmits    the content of the cookie associated with CLI (ID=“ABCDEF”)-   2. (This step is not applicable)-   3. CLI sends back a page comprising two frames to USER    -   the first frame is at the address http://URS.com/ . . .        ?ID=“ABCDEF”    -   the second frame is at the address http://CLI.com/ . . .-   4. USER sends URS the http query (http://URS.com/ . . .    ?ID=“ABCDEF”, to ask for the content of the first frame) as well as    the content of the cookie (created during a previous access and also    comprising ID=“ABCDEF”)-   5. URS responds-   6. (Optionally, CLI can ask for and/or receive extra data from URS    for ID=“ABCDEF”)-   7. (In parallel with 4.) USER sends CLI the http query to ask for    the content of the second frame-   8. CLI sends USER the content of the frame http://CLI.com/ . . . (as    appropriate after receipt of the data in step 6.)

The method described above makes it possible to select the links to bedisplayed in the web pages as a function of the browsing context²⁶. Thisis what will now be described. ²⁶ (Or, as described above, to select thequeries themselves; this being trivial, it is not described again)

Let us start from the situation where each query (the server which hostsit) possesses a set of initial URIs as well as the set of links thatcould be proposed to the user with their default scores: thenoncontextual scores.

As already described, the contextual score is an average of thenoncontextual scores, weighted as a function of depth, at each level ofthe context stack. Thus, r_(i) being the noncontextual score at the lastlevel and {tilde over (r)}_(i) the contextual score at the previouslevel, its value after having followed a link is: {tilde over(r)}_(i)♦λ{tilde over (r)}_(i)+ λr_(i) ²⁷. ²⁷ Thus giving {tilde over(r)}_(i)= λ _(n=0) ^(d−1)λ^(n)r_(i,n)+λ^(d)r_(i,d) with d the depth ofthe root and r_(i,n) the score of page P_(i) at depth n.

The links presented to the user are those which have the largestcontextual score.

The context stack can be displayed in the URS frame (the first frame)introduced above. Thus the user can see which pages are the ones thatwere involved in the calculation of the pages to be displayed. He canclick elements of the stack to climb back up the levels, and an “Erase”button makes it possible to empty the context stack.

The context stack is stored, for each user, in the centralized server(URS), with the user's identifier. Thus, each time a user opens a pageat a client server (CLI), the latter, having obtained the user'sidentifier, will give URS the noncontextual scores²⁸, which will respondwith the contextual scores after having performed the weighted averagedescribed above²⁹. The server of the client site may then display in thepage the links which have the best score. ²⁸ To avoid unnecessarytraffic it is possible to select the pages to be sent, taking only thosethat have a score greater than a certain threshold, for example half thethreshold required in order for a page to be displayed to the user²⁹This is performed within the framework of step 6 described above.

The steps are thus as follows (see FIG. 3):

-   1. The user (USER) sends an http query to open a page.-   2. The client server (CLI) transmits the noncontextual scores of the    page in question and the user's identifier to the centralized server    (URS)-   3. URS adds a level to the context and calculates the contextual    scores-   4. The contextual scores (at least the best of them) are returned to    the client server-   5. The client server selects the links which have the best score and    presents them to the user.

It may be beneficial on the one hand to group the links in various partsof the pages, or even to hierarchize the parts, that is to say to allowparts to contain subparts, in addition to links. Here are the changesthat this involves:

-   -   The current context³⁰ must contain context information for each        part of the page displayed, hence when the page sends its        noncontextual scores, it sends as many of them as there are        parts, and URS responds to it with a context for each part. To        avoid certain problems (see the following points), a default        context is also necessary, representing the page itself and its        parts and aggregating all the scores of all the links ³⁰ That is        to say the set of contextual scores of the links at the current        level.    -   When the user clicks on a link, the context of the part which        contains this link must be used as last-level context (i.e. that        context will be used for the calculation of the scores at the        subsequent levels). A means of obtaining this result is to place        in the addresses of the links an argument which contains an        identifier (unique for the page) of the part, which identifier        is also transmitted to URS with the noncontextual scores.    -   In the implementation of the method described here, care must be        taken not to confuse the parts of various pages, for example if        the user has opened several windows of his browser and clicks in        a window after having clicked in another (URS stores only a        context stack). This may be done by comparing the field HTTP        Referer with the address of the last level of the stack and take        no account of the part number other than in the case of        equality. In other cases (also if the user has passed through a        page of a nonclient site), the default context is taken.

A more complete example (see FIGS. 4 and 5):

Here therefore is what happens when the user, already in a particularcontext (for the page cl/com/main.html), clicks on a linkhttp://CLI.com/index.html?part=1 (part=1 signifies that the user hasclicked in part 1). It is assumed that the client server CLI does notyet know the user:

-   (1) The browser (USER) sends the query    http://CLI.com/index.html?part=1 to the server of the client site    (CLI), additionally giving him the Referer http://cl.com/main.html    (the address of this frame).-   (2) CLI will ask URS for a free number (it responds to it    with 12345) for this user-   (3) CLI responds to (1) with a page comprising two frames whose    addresses are http:/URS.com/default.html?newID=12345 and    http://CLI.com/main.html respectively. He also gives him a temporary    cookie (session cookie) newID=12345.-   (4) The user being known to URS, it has a cookie with its true    identifier (678910). By loading the frames, it (its browser) will    send a query for the page http://URS.com/default.html?newID=12345    with the cookie ID=678910.-   (5) The user also sends a query for the page    http://CLI.com/main.html with the session cookie newID=12345.-   (6) Having received (5), the client CLI sends URS its address    (http://CLI.com/main.html), its noncontextual scores, for each part    of the new page, the identifier newID=12345, as well as the part    number (part=) that it received to the message (1).-   (7) When it has received (4) and (6), URS looks at the context of    the user for part 1, verifies that the source page    (http://CLI.com/main.html) corresponds to the last level of the    context stack for this user (otherwise it would have ignored the    part number and taken the default part (“D”). Thereafter it    calculates, for each part of the new page the new contextual scores.-   (8) URS, having received the message (6), can respond to the    message (4) of the user (presenting him with the new context stack    and the <ERASE>button).-   (9) URS also responds to the message (6) from CLI, sending it the    true identifier of the user (678910), as well as the contextual    scores.-   (10) CLI can now respond to the message (1), giving the user are    true identifier (permanent cookie ID=678910, for the site CLI.com),    as well as the personalized page.

The concept of user can in reality encompass several users who shareadded links (and the groups which serve them as support). Of course, afiner organization of the users according to the added links that theyshare is possible.

We shall now describe the case where an end user subscribes to aprovider user so that, according to the context, the system proposes thegroups and first pages (in the sense of the groups and first pagesdescribed hitherto) created by the provider user to the end user. Thefirst pages may in particular be advertisements which (by virtue of thecapabilities of the system as hitherto) are automatically selected withrespect to the context.

The groups created by the provider user and proposed by the system tothe end user are called “spot”.

The provider user manipulates and utilizes the spots as describedhitherto for the groups of second pages.

The end user can use a spot as storage medium by making a personalversion thereof and adding thereto a link to a first page (this isdescribed later).

The main advantage of this approach is to afford the possibility ofcreating new spots (and the expensive calculations of scores that theyinvolve) to certain users only (namely the provider users) and to offerthe function of storage/associative search by way of pre-existing spots(which is not expensive in terms of machine resources) to all users.

Spot

The system that we shall now describe provides relevant links (alsoknown as “related links”, see above the “state of the art” section).However, rather than searching for relevant links directly, our systemsearches firstly to see whether there exists a spot—or referenceresource—whose associated links are sufficiently close to the currentresource or to the browsing context of the user. If such is the case,the system returns the spot(s) whose associated links are the closest,as well as its associated links offered in the guise of relevant links.

Typically the spot is proposed in a window adjacent to the main windowof the browser, like the existing systems providing “related links”,however in contra-distinction to these existing systems

-   -   the system of the invention presents relevant links determined        according to a relative distillation method (detailed later),    -   the browsing context taken into account by our system is not        necessarily solely the current page, but may include the set of        resources accessed recently by the user (using the system) and        which are relevant with respect to the current resource³¹ ³¹ See        above the description of the method of selecting groups of        second pages (here of spots) according to the user's browsing        context.    -   the spots serve as associated memory for the provider users;        specifically, when a spot is presented to an end user, the links        to first pages (or other added resources³², as described        previously) added by the provider user who created the spot are        presented to said end user³³, ³² The latter include in        particular advertisements billed to promoters. Advantageously,        these advertisements are relevant with respect to the context        (in any event the spots which serve them as support are).³³ (The        latter possibly moreover being said provider user who created        the spot)    -   the spots serve as associative memory for the end users;        specifically, when the end user adds a link to a first page on a        second page (as described hitherto), in reality he adds a link        on his personal version of the spot proposed for this second        page or for the current context.

Furthermore, presenting the end user with relevant links by way of spotsoffers advantages per se, such as prompting to click in order to accessthe reference resource (that is to say the page presenting the spot).

Let us now examine a few typical storage/associative search scenariosimplementing spots.

First Scenario of Use:

The provider user creates a new resource or chooses an existing resource(for example a web page which he wishes to access, or a particularelement contained in a page . . . ) so as to make thereof the referenceresource of a new spot.

To do this, he allocates it at least one given associated link pointingto a popular page.

The system completes the set of associated links³⁴ (as described in the“selecting the spots” section). ³⁴ This is the equivalent of the secondmemory described in the previous section.

Thus, in the future, each time an end user accesses a resource pointedout by one of the links associated with this spot, this spot may³⁵ beproposed to him. Also, as described in the subsequent two scenarios ofuse, end users may then use this new spot as storage medium (in a manneranalogous to the use of a second page or of a group of second pages,described above). ³⁵ It will not necessarily be this spot that isproposed but rather, among all the spots whose associated links point toresources forming the current context, the spot in which theseassociated links have the highest relevance scores (or the spots inwhich these associated links have the highest relevance scores). Theselection of the spot (or spots) is described in the “selecting a spot”section.

The creator of this spot thus has the advantage not only of putting itto his own use but also of seeing it proposed to end users. As a link onthe reference resource (prompting the user to click) is included in thepresentation of the spot, the reference resource is thus promoted to theend users. Moreover, its added links (such as advertisements) on thisspot will be presented to the end users.

Second Scenario of Use:

On the web the end user “lands” on a first page (or other type ofresource) that is so interesting that he would like to store it in orderto be able to retrieve it easily and land back on it spontaneously whenhe accesses resources that are relevant with respect to it.

Let us assume that no spot is spontaneously proposed by the system forthis page.³⁶ ³⁶ In the converse case, on (his personal version of) thisspot, the user will directly add a link to this first web page. Notehowever that this action is not strictly necessary. Specifically,already without doing anything the user will have to retrieve this firstpage by visiting a close page that is not very popular (in the guise ofrelevant link associated with this same spot or with a neighboringspot). However, by doing this action the user has the extra advantage ofbeing able to retrieve it in the guise of link added explicitly by him,that is to say in such a way that it is made evident.

The user visits a (at least one) second page, which is relevant withrespect to the first,

-   -   and for which he knows that a spot is proposed,    -   or else he chooses a web page which is popular since it is thus        more probable that a spot is proposed for it,        and on the spot which is proposed for this second page he adds a        link to this first page (for example by selecting a graphical        object representing a first page and by performing a drag and        drop thereon on the second page, as described at the start of        the description).

In the future, this added link will then be presented to himspontaneously each time that this same spot, or that a close spot, isproposed to him for the current context of his browsing.

Third Scenario of Use:

The end user wishes to store a private resource (such as a documentwhich belongs to him and which is not published on the web). The privateresource here plays the role of first page.

He accesses a (second) page which is relevant with respect to hisprivate resource (and which preferably is popular, or for which he knowsthat a spot is proposed) and he adds thereto a link to his privateresource (that is to say he inserts this link into his personal versionof the spot proposed for this second page).

Optionally, to reinforce his action, he will also add a link (to hisprivate resource) on yet (other spots which are proposed to him for)other second pages that he finds relevant with respect to his privateresource.

In the future, a link to his private resource will be presented to himspontaneously each time that one of the spots that was proposed to himfor the second page or pages, or that a close spot, is proposed to himfor the current context of his browsing.

Thus, in the last two scenarios above, a link to the first page ispresented to the user spontaneously each time that he visits pages inthe domain of relevance covered by the spots proposed for the secondpages³⁷. ³⁷ And insofar as the second pages were chosen by the userbecause according to him they are relevant with respect to the firstpage, and the relevance relation is transitive at this level, a link tothe first page is presented to the user spontaneously each time hevisits pages which according to him are in the domain of relevance ofthe first page!

Selecting the Spots

Before the spot(s) selection step proper, the system must obtain the setof “completed associated links” from the set of “given associated links”(which are given by the provider user, as described in the firstscenario of use).

Completing the Associated Links:

The set of resources pointed at by the given associated links is thequery R.

The calculation of the completed associated links is performed by meansof the “relative distillation” method, comprising the following steps:

-   Step 1: Identify the set R⁻ of resources which possess at least one    link pointing at an element of R.-   Step 2: Identify the set R⁻⁺ of resources pointed at by the elements    of R⁻ (note that R⁻⁺ includes R).-   Step 3: For each resource of R⁻⁺ calculate its authority score with    respect to R. (This step can include the identification of a part of    the resources of R⁻⁺⁻ possessing a link pointing to a resource of    R⁻⁺)³⁸. ³⁸ The resources of R⁻⁺ will start to be taken into account    right from the first iteration, as described later.-   Final step: Select the elements of R⁻⁺ having the largest authority    scores.

The calculation of the scores in step 3 may be performed by calculating,for each resource of R⁻⁺, the ratio between

-   -   the cardinality of the set of resources which point to it AND to        the resources of the query and    -   the cardinality of the set of resources which point to it OR to        the resources of the query        (or by means of one of the more complete equations described        later, see in particular the equation for the quantity of common        reasons—or homogeneity of a set of resources).

The authority scores are normalized (in such a manner that their sumbecomes equal to 1).

The authority scores having been obtained, they can be put to use toallocate hub scores to the elements of R⁻:

-   Step 4: The hub score of each element of R⁻ is obtained by taking    the sum of the authority scores (calculated in step 3) of the    elements of R⁻⁺ to which it points. The hub scores are normalized    (in such a way that their sum becomes equal to 1).

Iteration restarting from step 3: the hub scores having been obtained,they can be put to use to sharpen the calculation of the authorityscores. Step 3 then takes account of the hub scores so as not toconsider all the elements of R⁻ on an equal footing (the resources of R⁻pointing to resources having a higher authority score will thus have agreater influence). The cardinalities used to calculate the authorityscores are thus replaced by weighted cardinalities. That is to say eachhub resource, instead of counting for one, counts proportionately to itshub score. (The equations are detailed later).

Step 3 then includes the taking into account of the resources of R⁻⁺⁻pointing to the resources of R⁻⁺ having the largest authority scores, inaddition to R⁻ (a method optimizing the way in which R⁻⁺⁻ is taken intoaccount is described later).

After step 3, we can optionally perform step 4 again, and so on and soforth until convergence, that is to say until the difference between theresults obtained in the last iteration and those obtained in theprevious iteration are negligible (in general, fewer than 10 iterationsare sufficient).

Variant for step 2: to form R⁻⁺, instead of taking all the linkscontained in the resources R⁻ the system will take only the linkslocated in the relevant regions of the resources of R⁻. As theserelevant regions can be determined only onward of the moment at whichthe hub scores of the links that they contain are known, this variantwill be implemented only onward of the first iteration, that is to sayafter having performed step 4 the system will iterate restarting fromstep 2 rather than from step 3.

Variant for Step 3:

With each link possessed by a resource of R⁻ (or of R⁻⁺⁻) is associateda weight equal to the complement of the closeness of the two resourcesconnected by this link. Thus, the links connecting two close resourceswill be weakened. Thus the importance of the links between the resourceswhich mutually promote one another (for example because it form part ofone and the same web site and mutually cite one another) is thusdecreased. Once the links are thus weighted, the system calculates theauthority scores, not now by using the sum of the hub scores, but thesum of the hub scores multiplied by their weights (this is detailed andillustrated by an example later).

The closeness of the two resources connected by the link in question isobtained by calculating the ratio between

-   -   the cardinality of the set of resources which point to the two        connected resources and    -   the cardinality of the set of resources which point to at least        one of the connected resources.        (or by means in particular of one of the more complete equations        described later).

It is also advantageous to perform the same algorithm downstream, thatis to say by calculating the hub scores of the resources of R⁺⁻ (whichdownstream cite the same resources as the query).

The downstream algorithms are identical to those upstream except that B(backward) is replaced by F (forward) and vice versa³⁹, and ⁻ isinterchanged with ⁺ (e.g. R⁻⁺ is replaced by R⁺⁻). ³⁹ B(R_(i)) is theset of URIs of the pages having a link to the page R_(i). F(R_(i)) isthe set of URIs of the pages to which R_(i) has a link.

Consideration will also be given, advantageously, to the hub resourcesupstream and the authority resources downstream, in such a way that thehub pages in step m1-1 and the authority pages in step m1-1′ alsoacquire relevance scores (hub scores and authority scores respectively)and may thus be included in the resulting set provided at step m1-2 (inaddition to the URIs of the pages of R⁻⁺ and/or of R⁺⁻).

By completing the associated links of each new query (spot) introduced,the system forms an inverse memory able to provide, for each potentialcurrent resource corresponding to an associated link, the most relevantqueries (that is to say the most relevant spots).

Stated otherwise, the associative storage now comprises the followingsteps:

-   (Step m0 is performed independently of the other steps. Step m1 is    performed only for the queries, not already known by the system or    not sufficiently fresh, introduced by a provider user, while step m2    is performed for each use of a query (that is to say of a spot) as    associative storage medium by a provider user or an end user.)-   Step m0: store (in a third memory) the usage rights for spots for    each user.-   Step m1:-   Step m1-1 corresponds to completing the associated links as    described hereinabove.-   Step m1-2: store, in a second memory, the URIs of the resources    having a sufficient relevance score with respect to R, in relation    to R, in such a way that on the basis of the URI of each of said    resources having a sufficient relevance score with respect to R it    is possible to retrieve⁴⁰ R as well as said sufficient relevance    score; ⁴⁰ (From among the set of queries stored, as appropriate, for    this resource)-   Step m2: (in parallel with step m1) store in a first memory, for    each user and query, the added links (to first resources);    During access to a current resource by a user:-   Step b-m: Select from the second memory a certain number of queries,    from among the queries (spots) (indicated in the first memory) that    said user has the right to use, for which the relevance scores of    said current resource are the highest (if they exist) and for which    the relevance scores of the associated links are the closest to the    contextual relevance scores for said user;-   Step c: Retrieve from the first memory the links added by said user    on the queries selected in step b-m, as well as the links added by    their creators (if they are different from said user), and present    them to said user, with optionally:    -   the (or a certain number of the) queries on which they have been        added,    -   as well as a certain number of (associated links to) resources        having a sufficient estimated (in step m1-2) relevance score        with respect to said queries selected in step b-m.

The relative distillation method will now be detailed.

The essential idea of the calculation of the relevance score (of a webpage P₂ with respect to a given web page P₁) is as follows⁴¹: ⁴¹Hereafter, we shall assume that P₁ and P₂, (or P_(i), P_(j), etc) areweb pages, although the methods described are far more general, as hasalready been mentioned. For example, it should be noted that instead ofutilizing the hypertext links and the queries as mentioned hereinabove,the system may be based on analysis of the traces of the cutting andpasting of information fragments performed by the users (within theframework of creating and manipulating information resources), so as toautomatically suggest other fragments which might enrich theseresources. These traces may in fact be likened to links. For example,when part of a web page is copied into a document, the system is capableof deducing therefrom and of storing the existence in the document of alink to the web page, and the same mechanisms described here may then beapplied. Moreover, the method described here may advantageously beapplied by likening the links from one resource to another resource, tolinks from a user to a resource that he likes (that is to say to aresource which interests him). It is thus possible to determine thequantity of common reasons (between several resources) to be liked byusers. This can in particular serve to categorize these resources.

-   Let p₁ be the probability⁴² that a random author (of a web page)    places a link on P₁ in a page. ⁴² The probability of being    interested in a (or certain) page(s) is approximated by counting the    number of pages which have a link on it (them) and by dividing this    number by an estimate of the number of pages which could have had    one.-   Let p₂ be the probability that a random author places a link on P₂    in a page.-   Let p_(1&2) be the probability that a random author places a link on    P₁ and a link on P₂ in a page.-   B(P_(i)) is the set of URIs of the pages having a link to the page    P_(i).-   F(P_(i)) is the set of URIs of the pages to which P_(i) has a link.

The relevance of a page with respect to a set of pages may be defined bythe “quantity of common reasons” to be interested in all these pages.

Algebraic calculations make it possible to obtain equations giving thequantity of common reasons between several pages. This quantity (orcloseness, or else homogeneity) is denoted x, subscripted with the pagesconcerned; the probability of being linked to a certain page P_(i) isdenoted p_(i); the probability of being linked to at least one page outof P_(i), P_(i), . . . , P_(n) is denoted p_(ij) . . . n:

$\begin{matrix}{{\overset{\_}{x_{ij}} = \frac{\overset{\_}{p_{i}} \cdot \overset{\_}{p_{j}}}{\overset{\_}{p_{e}} \cdot \overset{\_}{p_{ij}}}},} & {\overset{\_}{x_{ijk}} = \frac{\overset{\_}{p_{i}} \cdot \overset{\_}{p_{j}} \cdot \overset{\_}{p_{k}} \cdot \overset{\_}{p_{ijk}}}{\overset{\_}{p_{e}} \cdot \overset{\_}{p_{ij}} \cdot \overset{\_}{p_{ik}} \cdot \overset{\_}{p_{jk}}}}\end{matrix},$and so on and so forth (all the subsets of odd size in the numerator,and the others in the denominator)⁴³. ⁴³ The bars above indicatecomplements, and p_(ø), the probability of liking at least one page ofan empty set, is a constant equal to zero; it is present in the equationfor reasons of consistency.

This equation may be denoted more compactly thus:

$\overset{\_}{x_{S}} = {{\prod\limits_{P \Subset S}{{\overset{\_}{p_{P}}}^{\;\sigma_{P}}\mspace{14mu}{with}\mspace{14mu}\sigma_{P}}} = {( {- 1} )^{P}.}}$

The probabilities concerned above involve the number (the count) ofpages of R⁻ which contain a given link or a link from among a set ofgiven URIs (to pages of R⁻⁺). It would be beneficial to weight thisnumber by the quality of citation (hub score, described later) of eachpage which contains such a link.

It would thus be desirable for a page of R⁻ citing more better pages (ofR⁻⁺) to be regarded as being of better quality of citation, and for inreturn a higher weight to be given to it within the framework of thecalculation of the scores⁴⁴ of the pages that it cites (R⁻⁺), the scoresof the pages of R⁻ and those of the pages of R⁻⁺ mutually influencingone another in an iterative approach (bipartite reinforcement) whichconverges⁴⁵. ⁴⁴ Recall that here one is dealing with relevance scoreswith respect to the query, in contradistinction to the state of the artwhich makes it possible to determine a score of quality “in theabsolute”.⁴⁵ Note that the calculation of the relevance score of a pageof R⁻⁺ may result in a negative value (that we will then neutralize;this is described later). Specifically, certain pages may not only beclose to the query, but even be antagonistic with respect to it (thefact of being of interest thereto decreases the chances of liking thepages of the query and vice versa).

The number of pages of R⁻⁺⁻ citing each candidate page (that is to sayof R⁻⁺) also comes into the calculations. However, it is expensive totake them into account.

Hence, the results will be approximated by considering only those whichcite the candidate pages having a good score, this score beingcalculated firstly by considering only R⁻ and subsequently by extendingthis set to R⁻⁺⁻ gradually. To calculate the relevance score of acandidate page, instead of taking the result of the equation for thequantity of reasons directly, it is preferable

-   -   to take it together with the overall cardinalities replaced by        the total of the hub scores of the pages in question and    -   to multiply this result by the authority score of the candidate        page (simply calculated on the basis of the total of the hub        scores of the citing pages), so as thus to weaken the pages        which are relatively less reliable (being less popular as they        are).

After a first iteration, in the citing pages the system can

-   -   label the regions containing directed links on pages of R⁻⁺        having a good score    -   and already begin to prune the links which are not situated in        these regions.

As the links in question are located under nodes of a typicallytree-like document structure (such as in HTML in particular), todetermine a relevance region it suffices to take the (minimal) nodeswhich encompass all the good links and to take away from them the(maximal) subnodes which contain a bad link (score too low, or URIexplicitly refused) and which contain no good link (sufficient score).The algorithm makes it possible, having a homogeneous set (havingsufficient homogeneity) of URIs associated with close pages, to obtain alist of URIs of pages which are relevant in regard to this set. The wayin which this algorithm may be utilized to obtain a set of relevantpages for an inhomogeneous set will be described later.

As input, this algorithm takes

-   -   a set K of reference URIs (“Kernel”)    -   a set A of candidate URIs (“Authority”)    -   a set H of hub candidate URIs    -   a set T of URIs to be refused (“Trash”)

We have: K⁻⊂H⊂A⁻ and T∩K=ø. (E being a set of URIs,

$ {E^{-} = {E^{-} = {{\bigcup\limits_{P_{i} \in E}{{B( P_{i} )}\mspace{14mu}{and}\mspace{14mu} E^{+}}} = {\bigcup\limits_{P_{i} \in E}{F( P_{i} )}}}}} )$

-   1. With each page P_(i) of H, associate a number h_(i), initially    set to

$\frac{1}{H},$its hub score⁴⁶. ⁴⁶ Thus, advantageously, the sum of the |H|| scoresh_(i) is equal to 1.

-   2. (Re)calculate the authority scores:    -   a. For each page P_(i) of A, beginning with those of K,        associate a number a_(i), its authority score, equal to

$\begin{matrix}{{\sum\limits_{j}{l_{ji} \cdot h_{j}}},} & {{{where}\mspace{14mu} l_{ji}} = \{ \begin{matrix}0 & {{if}\mspace{14mu}{there}\mspace{14mu}{is}\mspace{14mu}{no}\mspace{14mu}{link}\mspace{14mu}{between}\mspace{14mu} P_{j}\mspace{14mu}{and}\mspace{14mu} P_{i}} \\1 & {{if}\mspace{14mu}{there}\mspace{14mu}{is}\mspace{14mu} a\mspace{14mu}{link}\mspace{14mu}{between}\mspace{14mu} P_{j}\mspace{14mu}{and}\mspace{14mu}{P_{i}.}}\end{matrix} }\end{matrix}$

-   -   b. A possible but dangerous optimization: if, for certain pages,        a_(i) is sufficiently close to its value calculated previously        (as appropriate), and if the authority scores of the pages of K        have not varied either, we can keep the old value of r_(i) for        this page, to save on calculations.

-   3. (Re)calculate the relevance scores:    -   a. For each page P_(i) of A calculate r_(i) ⁺, equal to W_(i∪K)        r _(i) ⁺ =W _(i∪K)        and in the case where the result is negative (case of a page        antagonistic to R) neutralize the incoming links in such a way        as to have r_(i) ⁺=0.

The upstream homogeneity w_(S) of a set S is defined as follows:

${\overset{\_}{w_{S}} = {\prod\limits_{P \Subset S}{\overset{\_}{a_{P}}}^{\sigma_{P}}}},{where}$

-   -   σ_(P)        -   1 if P contains an even number of pages        -   +1 otherwise

-   a_(p)=Δ_(i)h_(j)l_(jp) where    Δ is an arbitrary constant less than but close to 1 (it serves to    avoid divisions by zero but does not change the principle of the    algorithrn. If the set H is larger than K⁻ then this constant may be    equal to one

$\begin{matrix}{l_{jp} = \begin{matrix}{+ 1} & {{if}\mspace{14mu}{\exists{P_{j}\mspace{11mu}\lfloor { P \middle| l_{ji}  = {+ 1}} }}} \\0 & {otherwise}\end{matrix}} \\{{{with}\mspace{14mu} l_{ji}} = \begin{matrix}0 & {{if}\mspace{14mu}{there}\mspace{14mu}{is}\mspace{14mu}{no}\mspace{14mu}{link}\mspace{14mu}{between}\mspace{14mu} P_{j}\mspace{14mu}{and}\mspace{14mu} P_{i}} \\1 & {{if}\mspace{14mu}{there}\mspace{14mu}{is}\mspace{14mu} a\mspace{14mu}{link}\mspace{14mu}{between}\mspace{14mu} P_{j}\mspace{14mu}{and}\mspace{14mu} P_{i}}\end{matrix}}\end{matrix}$

Stated otherwise, l_(jP) is equal to 1 if there is a link

-   -   from a page P_(j) (of H)    -   to at least one page P_(i) of P        and zero otherwise.

This signifies quite simply that a_(p) is the total of the hub scores ofthe pages (of H) which point at at least one page of P (P being thecurrent subset of S which is considered).

For each existing link l_(ji), it is possible to associate with it aweight as a function of the closeness of the pages P_(i) and P_(j) andthus to improve the result—see later.

Here, since ∀P_(i)εK we have r_(i) ⁺=W_(K) (the relevance is the samefor all the pages P_(i) of K), the relevance score r_(i) ⁺ has to becalculated only once for the pages of K (besides, it will already becalculated during the procedure for chopping the query R into subqueries(kernels) K, and will therefore already be known on entry to theprocedure).

-   b. (This point will be skipped the first time). To have their sum    equal to 1, we must divide each r_(i) ⁺ by the sum Σ|r_(i) ⁺| of all    the absolute values of the r_(i) ⁺.

Let

$\delta = {\sum\limits_{i}{{r_{i} - \frac{r_{i}^{+}}{\sum{r_{i}^{+}}}}}}$be the global variation of the relevance score.

If δ<ε(ε>0 being a margin of error) we assume convergence has occurredand the method stops. Otherwise, the method continues.

-   c. We replace r_(i) by

$\frac{r_{i}^{+}}{\sum\limits_{i}{r_{i}^{+}}}$

$ r_{i}\mapsto\frac{r_{i}^{+}}{\sum\limits_{i}{r_{i}^{+}}} $a friction factor τ also being able to be used:

$ r_{i}\mapsto{{\tau\; r_{i}} + {\overset{\_}{\tau}{\frac{r_{i}^{+}}{\sum\limits_{i}{r_{i}^{+}}} \cdot ( {\tau \in \lbrack {0;{1\lbrack , }} } }}} $we shall preferably take a very small value e.g. 0.01 so that in caseswhere this is not necessary the number of iterations does not change).

-   4. ⁴⁷ For each page P_(i) of H: ⁴⁷ This point may possibly be    ignored after the first time.    -   a. Find all the links which point at a page having a relevance        score larger than a threshold epsilon to be chosen (ε>0).    -   b. Find I_(i), the smallest HTML element⁴⁸ containing all of the        links found in point a above. ⁴⁸ (Or other analogous        representation . . . )    -   c. For each link pointing at a page of T (if T is not empty),        find the largest HTML element containing it (if there is one)        and not containing any link found in point a. above, and remove        it from I_(i).    -   d. We keep all the links remaining in I_(i) and we delete the        others (or else we neutralize them by setting their l_(ij) to        zero).-   5. Recalculate the hub scores:    -   a. For each page P_(i) of H, calculate h_(i) ⁺ _(j)l_(ij)r_(j),        the sum of the relevance scores of the pages pointed at.    -   b.

$ h_{i}\mapsto\frac{h_{i}^{+}}{\sum{h_{i}^{+}}} $(The division by Σ|h_(i) ⁺| is, as for the relevance score, so as tokeep their sum equal to 1).Then return to point 2.

Initially, so as to process only a reduced number of pages, therelevance scores may be calculated on the basis of R⁻ (if we took H=R⁻).Hence, this will only be an approximation. Specifically, for the scoresto be correct, they have to be calculated based rather on H=R⁻⁺⁻.However, as the construction of R⁻⁺⁻ is relatively expensive, we shalltake only a subset: for R⁻⁺⁻ we shall take only the pages pointing atthe pages of A which have a good score.

Thus⁴⁹, a subset will be added before the end of step 2.a: ⁴⁹ Severalprocedures may be used; here we present the preferred one.

2.a. 1. In the case where the score r_(i) ⁺ of the current page (P_(i)of A) is sufficient⁵⁰, r_(i) ⁺ is recalculated after having inserted thenew pages of B(P_(i)) into H. ⁵⁰ (That is to say greater than a chosenthreshold; this threshold can be dependent on the current cardinality ofH, specifically, the closer we get to R⁻⁺⁻ (e.g. H_(final)), the morechance the calculated score has of being correct)H|→B(P _(i))∪H

We introduce an authority score for the pages of A and the equationr_(i) ⁺ is r=w_(i∪K)·a_(i) (rather than r=w_(i∪K)). The new coefficienta_(i) will make it possible to weaken the pages that are not veryreliable (because they are not very popular). Furthermore, the equationwill be more consistent insofar as the relevance score will no longer bethe same for all the pages of the query.

The procedure is now as follows:

-   1. This point is the same as that of the algorithm for calculating    relevance scores presented above.-   2. This point does not change either.-   3. (Re)calculate the relevance scores:    -   a. For each page P_(i) of A calculate r_(i) ⁺, equal to        w_(i∪K)·a_(i) and in the case where the result is negative (case        of a page antagonistic to R) neutralize the incoming links so as        to have r_(i) ⁺=0.    -   b. Resume from point 3.b of the previously presented algorithm        for calculating relevance scores.        Filtering:

For each existing link l_(ji), it is possible to associate therewith aweight dependent on the closeness of the pages P_(i) and P_(j) and tothus improve the result. This makes it possible to decrease theimportance of the links between pages which mutually promote oneanother. Typically one thus succeeds in filtering for example the linksof the “abstracts” and other “menus” which, repeatedly, are located inall the pages of a site.

The basic idea consists in weakening the links connecting two pages thatwe know to be close, by assigning a weight to each link, which weightwill be equal to the complement of the closeness of the two connectedpages (the greater the closeness, the more the link must be weakened).Once the links have thus been weighted, it is possible to calculate thehomogeneity of a set of pages using the sum of their weights, ratherthan the number of citing pages.

In point 3.a of the algorithm, in the definition of the authority scorewe replace

${{{}_{}^{}{}_{}^{}}l_{jP}\mspace{14mu}{with}\mspace{14mu}{{}_{}^{}{}_{}^{}}l_{jP}\mspace{14mu}{where}\mspace{14mu} l_{jP}} = {\min\begin{matrix}{1;{\max( {l_{ji}?\overset{\_}{x_{ji}}} )}} \\{P_{i}\lfloor P }\end{matrix}}$Explanations:

-   -   l_(ji)? x_(ji) is the complement of the closeness between page        P_(j) and page P_(i) if there is a link from page P_(j) to page        P_(i), and zero otherwise

$\begin{matrix}{\max( {l_{ji}?\overset{\_}{x_{ji}}} )} \\{P_{i}\lfloor P }\end{matrix}$

-   -   is the complement of the closeness between page P_(j)εH in        question and page P_(i)εP for which the link between P_(j) and        P_(i) exhibits the minimum closeness

$\min\begin{matrix}{1;{\max( {l_{ji}?\overset{\_}{x_{ji}}} )}} \\{P_{i}\lfloor P }\end{matrix}$

-   -   signifies that this value is truncated above to 1

${{and}\mspace{14mu}{always}\mspace{14mu} l_{ij}} = \begin{matrix}{0\mspace{14mu}{if}\mspace{14mu}{there}\mspace{14mu}{is}\mspace{14mu}{no}\mspace{14mu}{link}\mspace{14mu}{between}\mspace{14mu} P_{j}\mspace{14mu}{and}\mspace{14mu} P_{i}} \\{1\mspace{14mu}{if}\mspace{14mu}{there}\mspace{14mu}{is}\mspace{14mu} a\mspace{14mu}{link}\mspace{14mu}{between}\mspace{14mu} P_{j}\mspace{14mu}{and}\mspace{14mu} P_{i}}\end{matrix}$

Stated otherwise, if there is at least one link

-   -   from the page P_(j) (of H) in question    -   to a page P_(i) of P,        l_(jP) is equal to the complement of the closeness between page        P_(j) and page P_(i) which is the least close to it and to which        it possesses a link. _(j)l_(jP) is the sum of the        weights thus associated with the pages of H which point at at        least one of the pages of the subset P considered.

To determine the closeness x_(ji), we can take the equation (alreadydescribed) for the quantity of common reasons:

$\overset{\_}{x_{AB}} = \frac{\overset{\_}{p_{A}} \cdot \overset{\_}{p_{B}}}{\overset{\_}{p_{\varnothing}} \cdot \overset{\_}{p_{AB}}}$

FIG. 6 presents an example where the number of pages pointing at page Ais equal to 0.9+0.2+0.4+0.8=2.3

The number of pages pointing at page B is equal to 0.9+0.1+0.3+0.5=1.8

The number of pages pointing at A or B (Np_(AB)) is equal to0.9+0.2+0.9+0.8+0.3+0.5=3.6

Thus, if we assume that |H|+h=100, the calculation of the closeness of Aand B gives:

${\overset{\_}{x_{AB}} = {\frac{\overset{\_}{p_{A}} \cdot \overset{\_}{p_{B}}}{\overset{\_}{p_{\varnothing}} \cdot \overset{\_}{p_{AB}}} = \frac{0.977 \cdot 0.982}{1 \cdot 0.964}}},{{this}\mspace{14mu}{giving}}$${\overset{\sim}{x}}_{AB} = {{\frac{x_{AB}}{p_{B}} \approx 0.264} = {26.4{\%.}}}$

The filtering described above uses a weight x_(ji) . Since we now havethe scores⁵¹ of the citing pages, we can optionally improve the methodby taking

$\overset{\_}{x_{ij}?\overset{\_}{h_{j}}}$as weight (instead of x_(ji) ), where h_(j) is the score of the citingpage (weakening a link) (from a citing page P_(j) to a cited page P_(i))further when the score of the citing page P_(j) is low. ⁵¹ (Whetherabsolute or with respect to the query)

It should be noted that in order to calculate the closeness x_(ji)between two connected pages P_(i) and P_(j), instead of using theequation for the quantity of reasons as illustrated hereinabove, it ispossible to calculate the ratio between:

-   -   the cardinality of the set of pages which point to P_(i) AND        P_(j)    -   and the cardinality of the set of pages which point to P_(i) OR        P_(i).        Determination of the Homogeneous Subsets of a Query:

We provide the system with a set R of pages and possibly a set of pagesR_(x) of pages that we do not explicitly want (R∩R_(x)=ø). The systemwill identify within R at least one group of “homogeneous” pages andwill launch a separate sub-query on this or each group. These groups arecalled “kernel”. To form the response, we shall then take a combinationof the scores obtained. This method thus comprises the following steps:

-   1. For each-page P_(i) of R, find B(P_(i)), the set of pages citing    P_(i).-   2. Find

${R^{-} = {\bigcup\underset{{P_{i}?},R}{B( P_{i} )}}},$the set of pages citing at least one page of R.

-   3. In the pages of R which are not yet in a kernel (at the start    none is), find the page P_(B) having the largest set B(P_(B)) of    incoming links⁵² and create a kernel containing only this page. This    kernel is now K_(C), the current kernel under construction (at any    instant there is just one of them). If all the pages were located in    at least one kernel then go to point six. ⁵² In the case where we    have the authority scores of the pages, or some other popularity    score, we prefer in fact to base ourselves on them.-   4. Find the relevant pages with respect to K_(C) (using the    algorithm for calculating relevance scores) with    -   H=R⁻    -   A=R    -   K=K_(C)    -   T=R_(X)-   5. Let P_(N) be the page of R, not yet in K_(C), which has the    highest relevance score. If its relevance score is less than a fixed    minimum score, return to point 3. (The current kernel is now    complete). Otherwise insert it into K_(C) and go back to point four.    It should be noted that it will not be necessary to reinitialize the    hub and authority scores, it is preferable to keep the latest values    calculated, thus the convergence ought to be very fast.-   6. We now have a set of kernels (upstream homogeneous sub-queries)    ready to be used as described in this document. When we want to    calculate the relevance scores globally to the whole query we    calculate an arithmetic average of the results for each of the    kernels.    homogeneity equation

As a variant, instead of basing ourselves on the homogeneity equation

$\overset{\_}{x_{S}} = {\prod\limits_{P \Subset S}{\overset{\_}{p}}_{P}^{{({- 1})}^{P}}}$as described hitherto, the relevance scores calculation example methodcan be based on another homogeneity equation, such as for

$x_{S} = {{\frac{{\bigcap\limits_{P_{i} \in S}{B( P_{i} )}}}{{\bigcup\limits_{P_{i} \in S}{B( P_{i} )}}}O\mspace{14mu}{or}\mspace{14mu}{else}\mspace{14mu} x_{S}} = {\frac{{\bigcap\limits_{P_{i} \in S}{B( P_{i} )}}}{{\bigcup\limits_{P_{i} \in S}{B( P_{i} )}}} \cdot \overset{\_}{( \frac{\underset{P_{i} \in S}{Min}{{B( P_{i} )}}}{\underset{P_{i} \in S}{Max}{{B( P_{i} )}}} )}}}$in which the ensemble cardinalities (represented between vertical bars)are replaced by the total of the hub scores of the pages in question⁵³.⁵³ We can say that the cardinalities are replaced by “weightedcardinalities”, the weights being the hub scores.

Downstream Processing:

Instead of searching for the good pages in relation to those of a kernelfrom among the pages that are cited in common with them it may bebeneficial to perform the same algorithms in the other direction, i.e.by searching among the pages which cite the same pages as the kernel, oreven to perform both and to calculate an arithmetic average.

The downstream algorithms are identical to those upstream except that Bis replaced by F and F is replaced by B, and ⁻ is interchanged with ⁺(for example R⁻⁺ is replaced by R⁺⁻).

The upstream and downstream methods may advantageously be integrated inthe following manner: after the upstream processing (possibly even aftereach upstream iteration), with the candidate pages (R⁻⁺) having obtaineda sufficient relevance score, we associate downstream a set of extrapages (“artificial pages”) whose cardinality is dependent on saidrelevance score. Each artificial page is also cited by (at least) onepage of the query. The scores of these good pages (of R⁻⁺) foundupstream⁵⁴ are thus given downstream an “advantage”, and consequentlythe scores of the pages (of R⁺⁻⁺) cited as appropriate by these goodpages are also indirectly given an advantage. ⁵⁴ Note that,advantageously, this is done without amalgamating the relevance scoresupstream and downstream.

And conversely, after the downstream processing (possibly even aftereach downstream iteration), the same method is applied symmetricallyupstream. Thus the good pages of R⁺⁻ are favored, as are indirectly thepages (of R⁻⁺⁻) which cite them, as appropriate.

By not amalgamating the scores upstream (of the pages R⁻⁺) with thescores downstream (pages R⁺⁻) it is possible to dissociate them in thecalculations. In particular, the influence of the scores obtaineddownstream can be decreased in the upstream processing or vice versa.

Moreover, by virtue of this idea of “artificial pages”, the presentmethod may be applied as a complement to the existing procedures of theprior art. Specifically, once the scores have been obtained for eachpage, the respective numbers of citing and cited pages can be modifiedartificially before applying these procedures.

It is possible to trek (known as “crawling”) the web by following thelinks (upstream and downstream) around the previously cited pages of the7 sets, exploiting the addition of the artificial pages to advantage theweb pages linked to the pages which are more relevant with respect tothe query.

Insofar as the pages having the best scores are presumed to be relevantto the user (and insofar as the relevance is transitive), the methodsdescribed here will be able to be applied recursively thereto todiscover yet other relevant pages. It is thus possible to trek the webbased on the user's query.

FIG. 7 diagrammatically presents such a method: the search for relevantpages can be applied recursively by extending the query with the “goodpages found upstream”, “good pages found downstream”, “good hub pages”and “good authority pages” which in the figure are framed by rectangles.At each recursion, the scores of the best pages found become slightlylower (because each time the best pages found are added into the query)and the method stops when the scores cease to be sufficient.

A system implementing the relative distillation method describedhereinabove is able to receive a search query composed of a set of URIsmaking it possible to access information resources such as web pages andto provide in response the URIs (or directly the pages) which arepresumed to be the most relevant with respect to said query.

The query being composed for example of the favorite links of the userand the goal of the system being for example to monitor the web aroundthese links and to notify the user when new interesting pages appeartherein, either by “Push” technology at the initiative of a server, orby “Pull” technology at the initiative of the user.

The user can of course provide the system directly with a set of URIs,nevertheless, other means may also be offered to him to assist him inthe preparation and submission of a search query.

To trigger the execution of a search query from a hypertext link locatedin a page, the user can use any one of the devices from among thefollowing:

-   -   A graphical object activatable for example by clicking (e.g. a        button), is presented close to certain hypertext links (URI) in        a web page. Its activation triggers the sending of a search        query containing the URI in question.    -   The system is furnished with a means able to toggle the page        into a state where each click on a link triggers the execution        of a search query (containing this link).    -   A key of the keyboard, such as the “Ctrl” key, pressed while        clicking (by a means of pointing) serves to trigger the        execution of a search query from the link on which cursor of the        pointing means is positioned.    -   The right-hand mouse button (or equivalent) serves to trigger        the execution of a search query from the link on which the        cursor of the mouse is positioned.    -   Other analogous device.

Each of these devices can advantageously make it possible to executesaid search query in addition to (in parallel with) access to the pagedesignated by the link in question. The result of the search query willfor example be displayed in a second window (new instance of thebrowser) or else in a subwindow of the browser⁵⁵. ⁵⁵ In a manneranalogous to the subwindow existing today for favorites, this subwindowmay be adjacent to the main subwindow in which the page containing thelink that the user has clicked was displayed and in which the pageaccessed by the act of clicking on this link is subsequently displayed.

As a supplement to the link selected, other URIs may be added routinelyinto the search query⁵⁶. They may in particular be: ⁵⁶ Specifically, oneof the essential advantages of the system is to be able to operate (findthe relevant information resources) even if the search query is composedof a plurality of URIs.

-   -   the links located in the page, in the region of the URI        selected;    -   the URIs previously selected by the user for this same query in        the course of his browsing⁵⁷; ⁵⁷ The new URIs found by the        system are then highlighted in the result returned to the user        (to distinguish them from the URIs which had already been        returned in the same browsing).    -   links explicitly envisaged and preferably determined by the        designer of the page to accompany the URI selected;    -   the URIs that another user (“mentor” or referent) considers to        be very relevant with respect to the URI selected, the mentor        being determined automatically by the system, or specified by        the user himself (chosen from a list of “pals” that he has        previously stored in the system), or else proposed by the page        designer (the user can also choose from a list of “experts”        proposed by the page designer).        Preparation of a Query:

We shall now describe how the user can prepare a query composed ofseveral links that he gleans in the course of his browsing.

a) Displaying of the Current Query Under Preparation

Instead of triggering a search query directly, the user's action (asdescribed above, for example the act of clicking on a link with theright-hand button and choosing the appropriate option) triggers thedisplaying of an accessory page in which:

-   -   in addition to the link that the user wishes to select⁵⁸, other        links, that he has, as appropriate, previously selected for this        same query, are presented; ⁵⁸ (As well as the links added        routinely, as appropriate, as described hereinabove)        -   boxes to be ticked may be displayed in association with each            link presented, in such a way that the user can in            particular select those links that will actually form the            query;    -   said accessory page is also furnished with an input means (such        as a button) making it possible to launch the search query.

Thus the user can prepare a query gradually, by selecting links oneafter the other⁵⁹ during his browsing⁶⁰ and thereafter send a querycomposed of several URIs. ⁵⁹ (In one and the same page or in differentpages)⁶⁰ (During one and the same browsing or staggered over time)

Said accessory page may additionally contain drop-down graphical objects(such as for example directories, records, folders, or similar metaphor)representing queries under preparation other than the query in progress.The user can thus choose the query or queries which will be enriched bythe new link that he has just selected.

Following the preparation of a query from a URI corresponding to ahypertext link in a page (as described above), the already existingqueries which, as appropriate, contain this URI are optionally presentedto him.

Advantageously, said accessory page may be composed of two parts. One ofthese parts contains the elements described hereinabove (that is to saythe elements of the query under preparation). The other part presentsthe content of the page designated by the link selected by the user.

For example, if the user clicks on a link while the page is in the statewhere all clicks trigger the displaying of the current query underpreparation (or with the right-hand button of the mouse, etc.), theserver returns said accessory page to it, which thus comprises:

-   -   in one part: the elements of the query under preparation    -   and in the other part: the content of the page designated by the        link clicked.

Thus, the use of the system represents an important advantage withrespect to conventional browsing around the web: the user receives notonly the page designated by the link that he has clicked (this isconventional web browsing), but at the same time he benefits from thepossibility of sending a query (containing several URIs) to obtain yetother resources relevant in relation to this page.

As a variant, said accessory page is returned after fast (or evenrestricted⁶¹) execution of the search query in the course of which thelink clicked was added. ⁶¹ In the case of a query regarding pagesalready crawled, the system can directly return the relevant URIs (orpages) already known and return the rest of the results later on.

The second page then directly contains a part of the result⁶². The userthen receives not only the page designated by the link that he hasclicked, but in addition he benefits directly from other resourcesrelevant in relation to this page. ⁶² (For example in the form of a listof URIs or a set of vignettes representing these pages in miniature)

More advantageously still, said accessory page may be displayed in asubwindow⁶³ adjacent to the main subwindow of the browser. This adjacentsubwindow opens in response to the action of the user who desires thedisplaying of the query under preparation (that is to say said accessorypage).⁶⁴ ⁶³ (Analogous to the favorites subwindow of the currentbrowsers)⁶⁴ Note that, in parallel with the displaying of the queryunder preparation, the server can advantageously already begin to trekthe web (crawling)—that is to say construct R⁻, R⁻⁺, R⁻⁺⁻, R⁺, R⁺⁻ andR⁺⁻⁺ as already described—around the link selected.

The query under preparation can thus be displayed in parallel(asynchronously) with the displaying of the page designated by the linkclicked; the latter page being displayed (independently) in the mainsubwindow.

The result of the search query can thereafter be presented in the sameadjacent subwindow.

As mentioned previously, a (partial) result may possibly be returnedafter partial or restricted execution of the search query in progress,to which query the link clicked was added. The adjacent subwindow thendirectly presents a fast search result (which will possibly besupplemented subsequently).

b) Result of the Execution of a Search Query

For each search query, the server can return the results directly (forexample returned from the HTTP query) or later on (for example byemail).

The server returns the URIs (resulting from a query) in a pageexhibiting the same structure as said accessory page (or said queryunder preparation), namely:

-   -   boxes to be ticked are associated with the links in such a way        that the user can select those links that he likes and delete        those he does not like⁶⁵ ⁶⁵ (That is to say ask the system to no        longer suggest them)        -   each URI⁶⁶ can thus be in at least one of the following            states⁶⁷: suggested (default state), accepted or deleted            (the URIs that are in the deleted state are not presented);            ⁶⁶ Optionally, the presentation of the result of a search            query includes the content of the pages (that are pointed at            by the resulting URIs) for example in miniaturized form            (vignettes).⁶⁷ Subsidiarially, an option to copy (“freeze”)            a page (locally or in a personal space on a server) may also            be offered to the user. Each link can then be in one of the            following states: suggested, accepted, deleted or frozen.    -   the page is furnished with an input means (such as a button)        making it possible to relaunch the search query.

The page returned also presents the other queries (from the same user)in the form of drop-down graphical objects, as already described. Theirpresentation may be hierarchized according to their relevance withrespect to the link clicked (according to the relevance calculationmethods described later).

The page returned presents means of control allowing the user to createnew queries and to delete existing queries. Of course, the user can cutand paste URIs from existing queries or from any other resource. Also,when the result of a query is returned by the server, the user can shift(hive off) the URIs received into other queries. Each query isindividually accessible by means of its own URI.

Maintaining the Spots

Described hitherto are several methods that use the relativedistillation procedure, starting from a query (e.g. the given associatedlinks of a spot) composed of a set of URIs, to determine and storerelevant URIs (e.g. the completed associated links of a spot) withrespect to this query, together with their relevance scores. Thesestored results are obtained on the basis of counting links located inthe resources of the sets R⁻⁺, R⁻⁺⁻, R⁻⁺⁻, R⁺⁻⁻, R⁺⁻⁺, R⁺⁻⁺⁻⁶⁸ etc.which are themselves stored at least in part. Now, these sets vary overtime (and the links located in the resources constituting these setsalso vary). The stored data must therefore be kept up to date and thecalculations must be redone when the data that they take as input varysignificantly. ⁶⁸ R⁻⁺⁻, R⁺⁻, and R⁺⁻⁺ are in particular used tocalculate the closeness of linked resources, and to filter, as describedabove, by taking the complement of this closeness as weighting for thecounting of the links in question.

Moreover, it is desirable to disclose new relevant resources even beforelinks pointing to them appear on the web. A method making it possible todo so will now be described.

For each query (for example for each spot),

-   -   select a first set of resources having the largest relevance        scores (such as the largest hub scores) for said request,    -   determine the relevant regions (that is to say the regions        possessing links to resources whose scores are high on average)        of said first set of resources having the largest relevance        scores,    -   monitor the new links which appear in said relevant regions and        which point to new resources (that is to say to resources that        were not yet known to the system),    -   select a second set of resources having a high relevance score        (such as the authority score) for said query,    -   select the new resources which are the most similar to the        resources of said second set of resources and give the new        resources selected a time-dependent authority score (as        described hereinbelow) as a function of their similarity to the        resources of said second set of resources.

The similarity of a resource with respect to other resources isdetermined by comparing their contents. Described hereinbelow is the wayto determine the similarity as a function of the distribution of thewords in the resources in question.

Time-Dependent Authority Score:

Each new authority resource has a hypertext authority score (a_(ht)) anda similarity authority score (a_(s)). Let τ be the ratio between

-   -   the time remaining in order for the resource in question to no        longer be considered to be new    -   and the total duration of newness (that is to say the total        duration for which a resource which has just been discovered by        the system is considered to be new). τ is therefore a number        equal to 1 at the start of the life of a resource in the system,        and decreases linearly until it reaches 0 at the moment at which        the resource in question is said to be old.

Thus τ is used as a weighting to go gradually from a similarity score toa hypertext score and the formula for the global score is

a = τ a_(s) + τ^(′)a_(ht)(with τ′=1−τ).

As the distribution of the words of a new resource varies in principleless than the hypertext links which point to it, a_(s) is considered tobe constant while a_(ht) must be updated over time. Thus the score a_(s)must be calculated at the moment at which the new resource isdiscovered, and for all the queries for which it is in a relevantregion, until it becomes old (thus if a link to this resource appears ina relevant region after it has become old, then its similarity with theresources of said second set will not be determined).

Similarity:

An absolute distillation algorithm will be used to determine the scorea_(s) of each new resource.

The known method of absolute distillation over a set of nodes connectedby links (thus forming an oriented graph) comprises the following steps:

-   1—allocate each node a hub score equal to 1 and an authority score,-   2—for each node calculate its authority score by adding up the hub    scores of the nodes which point to it, then normalize the authority    scores in such a way that their total is equal to 1,-   3—for each node calculate its hub score by adding up the authority    scores of the nodes to which it points, then normalize the hub    scores in such a way that their total is equal to 1,-   4—reiterate by restarting from step 2 until the algorithm converges,    that is to say until the scores are no longer significantly    different with respect to the previous step.

In addition, here the links are weighted by the similarities of theresources in question with respect to the distribution of their words.Steps 2 and 3 are replaced by the following:

-   2′—for each node calculate its authority score by adding up the hub    scores of the nodes which point to it, multiplied by the weight of    the respective links, then normalize the authority scores in such a    way that their total is equal to 1,-   3′—for each node calculate its hub score by adding up the authority    scores of the nodes to which it points, multiplied by the weights of    the respective links, then normalize the hub scores in such a way    that their total is equal to 1.

The weight of the similarity link between two resources is equal to thescalar product of their distributions of words (that is to say to thesum, for each word located in the two resources, of the product of thefrequencies of this word in these resources; the resulting sum is anumber between zero—case where there is no word in common—and 1—casewhere the two resources have the same content) after having removed thenonpertinent words (“stop words”).

It should be noted that the similarity links thus obtained arebidirectional.

Thus, the absolute distillation can thus be performed over the set ofresources comprising:

-   -   the new resource discovered,    -   and said second set of resources having high relevance scores,        to determine the scores a_(s) of this new resource discovered.

The methods described above also make it possible to select, from amonga set of extra resources, a resource which is the most relevant withrespect to a starting resource.

Accordingly, the following three steps are implemented:

-   (a) selection from the web of resources that are most similar to the    starting resource (typically a private resource), by one of the    procedures of the invention,-   (b) selection from the web of resources that are the most relevant    with respect to the resources selected in step (a), and-   (c) selection of extra resources (typically of private resources    again) that are the most similar to the most relevant resources    selected in step (b).

Such a method makes it possible in particular to dynamically generatethe content of web pages published as a function of context.

1. A method for managing information resources in a computer system forthe purpose of resource retrieval, said resources including a firstresource to be retrieved and a second resource obtained independentlyfrom said first resource and having a potential relevance relationshipwith other resources based on a relevance scoring process, the methodcomprising: a) receiving user information from a user input device, saiduser information being representative of a declaration that said firstresource is associated with said second resource for the purpose ofbeing later retrieved, and storing information relative to thisdeclaration; b) when selecting said second resource by a user inputdevice: b1) based on said stored information, further displaying anindicator of the existence of said first resource, c) when selecting another resource: c1) determining whether said other resource is relevantwith respect to said second resource, c2) if step c1) has determined arelevance between said other resource and said second resource, based onsaid stored information, further displaying an indicator of theexistence of said first resource, d) retrieving said first resourceutilizing said indicator displayed when selecting said second resourceor said other resource, wherein said first resource is retrievablealthough it initially had no connection with the second resource.
 2. Themethod as claimed in claim 1, wherein said user information received insaid step a) is representative of a declaration that said first resourceis associated with several second resources, all said second resourcesbeing obtained independently from said first resource.
 3. The method asclaimed in claim 1, wherein said second resource comprises a group ofresources, and wherein said relevance scoring process finds otherresources based on an input including said group of resources.
 4. Themethod as claimed in claim 3, wherein said group of resources comprisesresources derived from a browsing context.
 5. The method as claimed inclaim 3, wherein said group of resources forms a spot of resources. 6.The method as claimed in claim 3, wherein said relevance scoring processcomprises implementing a search engine based on the analysis of linksbetween various resources based on an input query comprising a series ofresource identifiers designating the resources of said group.
 7. Themethod as claimed in claim 3, wherein said other resources belong to aplurality of resources forming a browsing context.
 8. The method asclaimed in claim 1, wherein said step c1) is performed by comparing arelevance score with a threshold.
 9. The method as claimed in claim 1,wherein said step c1) is performed by using relevance data previouslyobtained by said relevance scoring process performed between said otherresource and said second resource.
 10. The method as claimed in claim 1,wherein said step c1) is performed by performing said relevance scoringprocess between said other resource and said other resource once saidother resource has been selected.
 11. The method as claimed in claim 1,wherein said indicator comprises a link to said first resource.
 12. Themethod as claimed in claim 1, wherein said step a) comprises receivinginformation from said user input device which is a pointing inputdevice, said information being representative of actions made with saidpointing input device on displayed graphical objects representative ofsaid first resource and said second resource.
 13. The method as claimedin claim 1, wherein said step a) further comprises the storage in a userassociative memory of information representative of an associationbetween the first and second resources.
 14. The method as claimed inclaim 1, wherein said first resource is a personal file, and said secondand other resources are web pages.
 15. A method for managing informationresources in a computer system for the purpose of resource retrieval,said resources including a first resource to be retrieved and a secondresource obtained independently from said first resource and having apotential relevance relationship with other resources based on arelevance scoring process, the method comprising: a) receiving userinformation from a user input device, said user information beingrepresentative of a declaration that said first resource is associatedwith said second resource for the purpose of being later retrieved, andstoring information relative to this declaration; b) when accessing saidsecond resource by a user input device: b1) displaying said secondresource, b2) based on said stored information, further displaying anindicator of the existence of said first resource, c) when accessing another resource, by a user input device: c1) determining whether saidother resource is relevant with respect to said second resource, c2)displaying said other resource, and c3) if step c1) has determined arelevance between said other resource and said second resource, based onsaid stored information, further displaying an indicator of theexistence of said first resource, d) retrieving said first resourceutilizing said indicator displayed when accessing said second resourceor said other resource, whereby said first resource is retrievablealthough it initially had no connection with the second resource. 16.The method as claimed in claim 15, wherein said user informationreceived in said step a) is representative of a declaration that saidfirst resource is associated with several second resources, all saidsecond resources obtained independently from said first resource.